Aug 31, 2010

The Integration Data Model

Loriane Lawson over at IT Business Edge has a penchant for touching upon particularly interesting integration problems.  This week she asks

It's interesting: Writing custom code for data integration is nearly universally frowned upon by experts – and yet, I'm seeing a lot of discussion about creating your own metadata solutions for support with integration.

My question is: If you're getting away from hand-coding, why would you want to delve into customized metadata solutions?

The article focuses upon some of the technical issues and technical approach to such a discussion.  I’d like to focus on the business issues, and how those should be directing the technical approach (but aren’t).

Every application involved in integration brings along it’s database (or general data) model and it’s internal data object model.  These models were developed to meet the functional goals of the particular application.  For example, the “customer” representation in the CRM (customer resource management) application will be different from the “customer” representation in the billing application.

The different requirements of that representation lead to a different logical implementation of the representation.  In the CRM application it may be a series of relational tables representing different aspects of the full customer – and a series of classes that provide the different portion of customer interaction.  In the billing application customer may be a single table and a single class, as only basic customer information is needed for the billing process (which is focused on charges and activity).

Business-wise, the billing department doesn’t think about the full customer relationship.  Customer service or future sales are not part of their business process.  Their view of the customer is a narrow slice and it’s reflected in their application.

When we want to integrate these systems we’re faced with translating between these different models of customer.  And in many integrations the data mapping effort is THE major effort of the project.

Not the implementation of the mapping and data transformation.  ETL tools and ESB tools do this easily with visual drag-and-drop mapping and/or simple transformations scripting (with the ability to drop back to a code level for very complex element transformations).  Even third party tools are available that offer map-and-drop-into-your-environment deployments (see Altova’s MapForce for example). 

Rather, the effort is focused on the actual analysis and modeling necessary to produce a transformation map that can be implemented.  Gathering accurate information on the exact meaning of each data element is often very challenging.  Not at the database layer (where often there’s a decent data model with some minimal documentation, or at the very least a basic one can be generated by the database for you) but at the internal application level.  Because it’s the internal application data object model that’s being exposed.

And good documentation on that layer is rare, often requiring ‘digging into the code’ to get an accurate answer.  The problem can be worse when dealing with a vendor application.  There one is reliant purely upon detailed vendor documentation being accurate, sufficiently detailed and up to date.

So the analysis and design of the mapping between the applications is often the major effort of an integration project.  As such one solution approach is to put major effort into Metadata Management.  Building or buying the tools necessary to capture and track this information as it’s analyzed and implemented becomes a major way of gradually improving the future.

Personally I recommend a different approach.  Because I’ve seen huge metadata management teams in large corporate IT shops be basically completely ineffective, I recommend an approach that reduces the mapping problem altogether.

By creating a layer of Standard Enterprise Entities for integration, preferably based on an industry standard such as ACORD for Insurance, HR-XML for HR transactions, or one of the many XML industry standards, we force all integrations to map into and out of the standard.  This gives us a fixed set of metadata to work with, and over time as the individual applications develop new functionality and expose it, rather than expose in their proprietary format and spend time metadata mapping they begin exposing directly in the enterprise entity standard.  Further along the maturity cycle the enterprise standard begins to extend deeper into the applications as they begin to model their internal representations around the standard (avoiding even internal mapping).

Following this approach significantly reduces the metadata and mapping problem over the long term (5 years) and provides the short term benefits of speedier integrations (as the enterprise teams become comfortable working with the entities and can quickly utilize services exposed in that pattern without mapping).