Feb 15, 2009

SOA and Batch - Part 2, The Access Pattern

(Continuing the discussion of Batch Processing and SOA.)

This article focuses on USING services (consuming a web service) as part of a batch oriented process.

As mentioned in Part 1, there's a strong tendency to want to consume (web) services as part of a newly developed batch process - usually due to reuse (other reasons discussed in the Part 1). It's pretty natural to say "well, services are reusable, and here we have a process that needs to use them. So, use them."

However, there's a natural architecture incompatibility that has to be address. Web services are, by nature, remote. Batch process are oriented around local processing of large quantities of data/transactions, by marshalling large amounts of local resources to maximize the throughout. Here's an example to explain better...

In the ancient days of programming, say before 1990 or so, the majority of systems were built using local files. Relational databases didn't exist yet, or were just gaining commercial viability (the first commercial RDBMS was released for the mainframe in 1978, and Oracle as a company formed in 1979). Data was accessed locally using a variety of file indexing schemes (ISAM). Programmers were confident of being able to access 1 record in 1 file very quickly, and the data handling patterns were designed around that mode of access. (Code tables were buffered, step-through processes the norm.)

In such an access model, moving the data - the tables - away from local access is a nightmare. The model is based on very speedy access to each 1 record, the indexing routines being a library loaded into the local program requiring very speedy reading through index files or index header records. If the file access speed drops from 3ms to 6ms, the program might literally run twice as slow. Bad, very bad.

This model was fundementally incompatible with relational databases. In relational databases, the whole data structure moves off the local machine to a "database server" and is optimized around a multi-file single read operation that returns many records in a single request. The reason this is critical is because of the overhead...

Marshall, send, and receive a network request: 30ms
Create a request string, interpret, process: 20ms
Interpret multi-record answer: 10ms

That's 60ms, 20 times SLOWER than traditional local indexed file access. If we attach the average program of 1980 to a relational database on the network, performance falls through the floor as each record request becomes 20 times slower.

But when we adjust the data handling pattern to request the data needed through it's relational structure, returning multiple records across multiple tables in a single data access request (a SQL call), we might get the equivalent of 100 individual file index and data table reads returned in a single call, getting 300ms of direct table access in 60ms, 5 times faster with the added advantage of dividing our processing across 2 servers.

The key was changing the data access model to take advantage of the new technology advantages.

A web service access pattern is usually designed for "online" or real-time(ish) access. Details of this in my next article.