Skip to main content

The Integration Data Model


Loriane Lawson over at IT Business Edge has a penchant for touching upon particularly interesting integration problems.  This week she asks

It's interesting: Writing custom code for data integration is nearly universally frowned upon by experts – and yet, I'm seeing a lot of discussion about creating your own metadata solutions for support with integration.

My question is: If you're getting away from hand-coding, why would you want to delve into customized metadata solutions?

The article focuses upon some of the technical issues and technical approach to such a discussion.  I’d like to focus on the business issues, and how those should be directing the technical approach (but aren’t).

Every application involved in integration brings along it’s database (or general data) model and it’s internal data object model.  These models were developed to meet the functional goals of the particular application.  For example, the “customer” representation in the CRM (customer resource management) application will be different from the “customer” representation in the billing application.

The different requirements of that representation lead to a different logical implementation of the representation.  In the CRM application it may be a series of relational tables representing different aspects of the full customer – and a series of classes that provide the different portion of customer interaction.  In the billing application customer may be a single table and a single class, as only basic customer information is needed for the billing process (which is focused on charges and activity).

Business-wise, the billing department doesn’t think about the full customer relationship.  Customer service or future sales are not part of their business process.  Their view of the customer is a narrow slice and it’s reflected in their application.

When we want to integrate these systems we’re faced with translating between these different models of customer.  And in many integrations the data mapping effort is THE major effort of the project.

Not the implementation of the mapping and data transformation.  ETL tools and ESB tools do this easily with visual drag-and-drop mapping and/or simple transformations scripting (with the ability to drop back to a code level for very complex element transformations).  Even third party tools are available that offer map-and-drop-into-your-environment deployments (see Altova’s MapForce for example). 

Rather, the effort is focused on the actual analysis and modeling necessary to produce a transformation map that can be implemented.  Gathering accurate information on the exact meaning of each data element is often very challenging.  Not at the database layer (where often there’s a decent data model with some minimal documentation, or at the very least a basic one can be generated by the database for you) but at the internal application level.  Because it’s the internal application data object model that’s being exposed.

And good documentation on that layer is rare, often requiring ‘digging into the code’ to get an accurate answer.  The problem can be worse when dealing with a vendor application.  There one is reliant purely upon detailed vendor documentation being accurate, sufficiently detailed and up to date.

So the analysis and design of the mapping between the applications is often the major effort of an integration project.  As such one solution approach is to put major effort into Metadata Management.  Building or buying the tools necessary to capture and track this information as it’s analyzed and implemented becomes a major way of gradually improving the future.

Personally I recommend a different approach.  Because I’ve seen huge metadata management teams in large corporate IT shops be basically completely ineffective, I recommend an approach that reduces the mapping problem altogether.

By creating a layer of Standard Enterprise Entities for integration, preferably based on an industry standard such as ACORD for Insurance, HR-XML for HR transactions, or one of the many XML industry standards, we force all integrations to map into and out of the standard.  This gives us a fixed set of metadata to work with, and over time as the individual applications develop new functionality and expose it, rather than expose in their proprietary format and spend time metadata mapping they begin exposing directly in the enterprise entity standard.  Further along the maturity cycle the enterprise standard begins to extend deeper into the applications as they begin to model their internal representations around the standard (avoiding even internal mapping).

Following this approach significantly reduces the metadata and mapping problem over the long term (5 years) and provides the short term benefits of speedier integrations (as the enterprise teams become comfortable working with the entities and can quickly utilize services exposed in that pattern without mapping).

Popular posts from this blog

Integration Spaghetti™

  I’ve been using the term Integration Spaghetti™ for the past 9 years or so to describe what happens as systems connectivity increases and increases to the point of … unmanageability, indeterminate impact, or just generally a big mess.  A standard line of mine is “moving from spaghetti code to spaghetti connections is not an improvement”. (A standard “point to point connection mess” slide, by enterprise architect Jerry Foster from 2001.) In the past few days I’ve been meeting with a series of IT managers at a large customer and have come up with a revised definition for Integration Spaghetti™ : Integration Spaghetti™ is when the connectivity to/from an application is so complex that everyone is afraid of touching it.  An application with such spaghetti becomes nearly impossible to replace.  Estimates of change impact to the application are frequently wrong by orders of magnitude.  Interruption in the integration functioning are always a major disaster – both in terms of th

Solving Integration Chaos - Past Approaches

A U.S. Fortune 50's systems interconnect map for 1 division, "core systems only". Integration patterns began changing 15 years ago. Several early attempts were made to solve the increasing problem of the widening need for integration… Enterprise Java Beans (J2EE / EJB's) attempted to make independent callable codelets. Coupling was too tight, the technology too platform specific. Remote Method Invocation (Java / RMI) attempted to make anything independently callable, but again was too platform specific and a very tightly coupled protocol. Similarly on the Microsoft side, DCOM & COM+ attempted to make anything independently and remotely callable. However, as with RMI the approach was extremely platform and vendor specific, and very tightly coupled. MQ created a reliable independent messaging paradigm, but the cost and complexity of operation made it prohibitive for most projects and all but the largest of Enterprise IT shops which could devote a focused technology

From Spaghetti Code to Spaghetti Connections

Twenty five years ago my boss handed me the primary billing program and described a series of new features needed. The program was about 4 years old and had been worked on by 5 different programmers. It had an original design model, but between all the modifications, bug fixes, patches and quick new features thrown in, the original design pattern was impossible to discern. Any pattern was impossible to discern. It had become, to quote what’s titled the most common architecture pattern of today, ‘a big ball of mud’. After studying the program for several days, I informed my boss the program was untouchable. The effort to make anything more than a minor adjustment carried such a risk, as the impact could only be guessed at, that it was easier and less risky to rewrite it from scratch. If they had considered the future impact, they never would have let a key program degenerate that way. They would have invested the extra effort to maintain it’s design, document it property, and consider