Skip to main content

SOA and Batch - Part 2, The Access Pattern

(Continuing the discussion of Batch Processing and SOA.)

This article focuses on USING services (consuming a web service) as part of a batch oriented process.

As mentioned in Part 1, there's a strong tendency to want to consume (web) services as part of a newly developed batch process - usually due to reuse (other reasons discussed in the Part 1). It's pretty natural to say "well, services are reusable, and here we have a process that needs to use them. So, use them."

However, there's a natural architecture incompatibility that has to be address. Web services are, by nature, remote. Batch process are oriented around local processing of large quantities of data/transactions, by marshalling large amounts of local resources to maximize the throughout. Here's an example to explain better...

In the ancient days of programming, say before 1990 or so, the majority of systems were built using local files. Relational databases didn't exist yet, or were just gaining commercial viability (the first commercial RDBMS was released for the mainframe in 1978, and Oracle as a company formed in 1979). Data was accessed locally using a variety of file indexing schemes (ISAM). Programmers were confident of being able to access 1 record in 1 file very quickly, and the data handling patterns were designed around that mode of access. (Code tables were buffered, step-through processes the norm.)

In such an access model, moving the data - the tables - away from local access is a nightmare. The model is based on very speedy access to each 1 record, the indexing routines being a library loaded into the local program requiring very speedy reading through index files or index header records. If the file access speed drops from 3ms to 6ms, the program might literally run twice as slow. Bad, very bad.

This model was fundementally incompatible with relational databases. In relational databases, the whole data structure moves off the local machine to a "database server" and is optimized around a multi-file single read operation that returns many records in a single request. The reason this is critical is because of the overhead...

Marshall, send, and receive a network request: 30ms
Create a request string, interpret, process: 20ms
Interpret multi-record answer: 10ms

That's 60ms, 20 times SLOWER than traditional local indexed file access. If we attach the average program of 1980 to a relational database on the network, performance falls through the floor as each record request becomes 20 times slower.

But when we adjust the data handling pattern to request the data needed through it's relational structure, returning multiple records across multiple tables in a single data access request (a SQL call), we might get the equivalent of 100 individual file index and data table reads returned in a single call, getting 300ms of direct table access in 60ms, 5 times faster with the added advantage of dividing our processing across 2 servers.

The key was changing the data access model to take advantage of the new technology advantages.

A web service access pattern is usually designed for "online" or real-time(ish) access. Details of this in my next article.

Comments

Popular posts from this blog

Integration Spaghetti™

I’ve been using the term Integration Spaghetti™ for the past 9 years or so to describe what happens as systems connectivity increases and increases to the point of … unmanageability, indeterminate impact, or just generally a big mess.  A standard line of mine is “moving from spaghetti code to spaghetti connections is not an improvement”.(A standard “point to point connection mess” slide, by enterprise architect Jerry Foster from 2001.)In the past few days I’ve been meeting with a series of IT managers at a large customer and have come up with a revised definition for Integration Spaghetti™ :Integration Spaghetti™ is when the connectivity to/from an application is so complex that everyone is afraid of touching it.  An application with such spaghetti becomes nearly impossible to replace.  Estimates of change impact to the application are frequently wrong by orders of magnitude.  Interruption in the integration functioning are always a major disaster – both in terms of the time and peopl…

Solving Integration Chaos - Past Approaches

A U.S. Fortune 50's systems interconnect map for 1 division, "core systems only".

Integration patterns began changing 15 years ago. Several early attempts were made to solve the increasing problem of the widening need for integration…

Enterprise Java Beans (J2EE / EJB's) attempted to make independent callable codelets. Coupling was too tight, the technology too platform specific.

Remote Method Invocation (Java / RMI) attempted to make anything independently callable, but again was too platform specific and a very tightly coupled protocol.

Similarly on the Microsoft side, DCOM & COM+ attempted to make anything independently and remotely callable. However, as with RMI the approach was extremely platform and vendor specific, and very tightly coupled.

MQ created a reliable independent messaging paradigm, but the cost and complexity of operation made it prohibitive for most projects and all but the largest of Enterprise IT shops which could devote a focused technology tea…

From Spaghetti Code to Spaghetti Connections

Twenty five years ago my boss handed me the primary billing program and described a series of new features needed. The program was about 4 years old and had been worked on by 5 different programmers. It had an original design model, but between all the modifications, bug fixes, patches and quick new features thrown in, the original design pattern was impossible to discern. Any pattern was impossible to discern. It had become, to quote what’s titled the most common architecture pattern of today, ‘a big ball of mud’.

After studying the program for several days, I informed my boss the program was untouchable. The effort to make anything more than a minor adjustment carried such a risk, as the impact could only be guessed at, that it was easier and less risky to rewrite it from scratch.
If they had considered the future impact, they never would have let a key program degenerate that way. They would have invested the extra effort to maintain it’s design, document it property, and consider t…