Dec 25, 2014

Bad Integration by Design or How to Make a Horrible Web Service

To understand what makes easy integration or a “good web service”, it’s worth taking a glance at the historical methods of I.T. systems integration.  After all, business systems have been passing data around and/or activating each other, aka integrating, for almost as long as there has been commercial I.T. business systems (approximately since 1960). 

The first major “interface” method between systems was throwing sequential fixed-length record files at each other.  This was pretty much the only method for 20 years and still remains in widespread use, though mostly around mainframe and legacy systems.  The system providing the interface, either outputting the data or providing a format for which to send it data, defines a field by field interface record, along with header and footer records.  Because these are fixed length records, the descriptive definition (the human readable documentation) must include the format and length of each field, along with any specialized logic interpretation or encoding.  For example, if a record represents a person, which includes their gender, it might specify a 1 byte single digit field, with a 0 representing male and a 1 representing female.  (Given that this appropriate started in the early days of computing, there is also a strong tendency to minimize data size – save the bytes! – leading to additional encoding logic within the definition.)  Because the definition is fixed length records, no data typing can be enforced within the data format, only at time of programmatic interpretation.

So how did this approach work?  It worked great.  This is the base approach of generations of systems, especially financial and business systems. 

If it worked great, why don’t we do this anymore?

Answer: Because of the data typing (no enforcement in the format), the encoding (no enforcement in the format and not understandable without documentation), and other dependent logic (such as cross field validation instructions, example “if field 2 is female, then you may fill out the field 9 for number of pregnancies”), getting an interface build and correct would take 2-6 weeks per connection.  So while this method worked, it was time consuming to successfully implement.

API’s came along to allow direct activation, and defined a fixed set of data types required to activate.  This solved the problem of the first model’s data typing without enforcement, and part of the documentation problem (the data types became self explanatory).  Further, the API’s could define descriptive names for the data fields, thereby providing some self-documenting ability within the API.  A major improvement.

API’s, however, added a new problem: they were technology, and often version, dependent.  Meaning an API exposed on one system in one language in one release was compatible only with another system in a matching system / language / and version. 

Regardless, integration via APIs was easier and faster.  And it became the base technology that allowed Windows, Unix and other modern operating systems to move from being simply an execution starter and hardware interface to being a facilitator of interaction between applications.  It further allowed a real-time interaction that was not possible previously.  That said, figuring out and correctly using an API could still take days to weeks.  Embedded cross field validation and logic would often slow down the process.

API’s evolved in the next generation with REMOTE APIs.  Remote APIs moved the cross-application interaction to cross-system cross-environment interaction.  The original remote API technology with commercial success included DCOM, CORBA, and RMI.  All of these commercial implementations worked, but were very complicated and highly sensitive to perfect conditions.  And, for the most part, they were TECHNOLOGY specific (as well as being version specific).  So while they began to offer the new ability of remote invocation and/or coordinated system interaction, the environment had to be perfectly configured and matching technology and version.

Each one of these generations of integration technology worked within it’s context and solved problems not previously solvable – offering new abilities and new opportunities.  Yet their limitations meant they remained niche solutions for specific narrow problems. 

With the arrival of web services, a new integration level was reached.  Web Services offered all the previous abilities while adding key points:

- The data format is XML, and therefore self descriptive.

- The service and data format is defined with an XSD, and therefore is self validating.

- The communication protocol is firewall and technology neutral and friendly.

- The data format is technology neutral and supported by all development tools.

With these abilities added to the historical ones, integration moved from a major project effort to…simple, trivial, fast.  And with that change web services and integration became more than just commonplace, it became the way to do things.  (This brings some new problems, such as integration spaghetti and interconnection dependencies, but that’s a different discussion.)

So how do you make a horrible web service?  Simply strip away one more more of the primary advantages it offers.  Examples:

-- Serialize language specific objects into your web service as one or more data items.  For example, serialize a .NET object into your web service.  The result, a web service that can only work with .NET (and of the appropriate version).  Yes, I’ve seen this done.

-- Place “codes” in data fields in the web service.  For example, make a field <Gender> where “3” = Male and “1” = Female.  Then explain to the user of the web service that they must download your table of codes / values to insert the correct values or interpret the values.  This, sadly, is a not-uncommon error.

-- Structure the XML as just a flat list of fields even though it could be placed in a hierarchy, or is in a hierarchy in the objects or database tables.  The corollary of this error is to expose multiple services for each level of a hierarchy rather than one service with a hierarchy.  This is the error of sharing data and not the business function / transaction.  All too common.

In general, by stripping a web service approach to an earlier generation by stripping an ability, the result is a service of limited use, difficult re-use, and challenging to understand.  Each of these problems turns into extra time and complexity, the exact opposite of what services come to solve.

I recommend avoiding these errors.