Aug 31, 2010

The Integration Data Model




Loriane Lawson over at IT Business Edge has a penchant for touching upon particularly interesting integration problems.  This week she asks

It's interesting: Writing custom code for data integration is nearly universally frowned upon by experts – and yet, I'm seeing a lot of discussion about creating your own metadata solutions for support with integration.

My question is: If you're getting away from hand-coding, why would you want to delve into customized metadata solutions?

The article focuses upon some of the technical issues and technical approach to such a discussion.  I’d like to focus on the business issues, and how those should be directing the technical approach (but aren’t).

Every application involved in integration brings along it’s database (or general data) model and it’s internal data object model.  These models were developed to meet the functional goals of the particular application.  For example, the “customer” representation in the CRM (customer resource management) application will be different from the “customer” representation in the billing application.

The different requirements of that representation lead to a different logical implementation of the representation.  In the CRM application it may be a series of relational tables representing different aspects of the full customer – and a series of classes that provide the different portion of customer interaction.  In the billing application customer may be a single table and a single class, as only basic customer information is needed for the billing process (which is focused on charges and activity).

Business-wise, the billing department doesn’t think about the full customer relationship.  Customer service or future sales are not part of their business process.  Their view of the customer is a narrow slice and it’s reflected in their application.

When we want to integrate these systems we’re faced with translating between these different models of customer.  And in many integrations the data mapping effort is THE major effort of the project.

Not the implementation of the mapping and data transformation.  ETL tools and ESB tools do this easily with visual drag-and-drop mapping and/or simple transformations scripting (with the ability to drop back to a code level for very complex element transformations).  Even third party tools are available that offer map-and-drop-into-your-environment deployments (see Altova’s MapForce for example). 

Rather, the effort is focused on the actual analysis and modeling necessary to produce a transformation map that can be implemented.  Gathering accurate information on the exact meaning of each data element is often very challenging.  Not at the database layer (where often there’s a decent data model with some minimal documentation, or at the very least a basic one can be generated by the database for you) but at the internal application level.  Because it’s the internal application data object model that’s being exposed.

And good documentation on that layer is rare, often requiring ‘digging into the code’ to get an accurate answer.  The problem can be worse when dealing with a vendor application.  There one is reliant purely upon detailed vendor documentation being accurate, sufficiently detailed and up to date.

So the analysis and design of the mapping between the applications is often the major effort of an integration project.  As such one solution approach is to put major effort into Metadata Management.  Building or buying the tools necessary to capture and track this information as it’s analyzed and implemented becomes a major way of gradually improving the future.

Personally I recommend a different approach.  Because I’ve seen huge metadata management teams in large corporate IT shops be basically completely ineffective, I recommend an approach that reduces the mapping problem altogether.

By creating a layer of Standard Enterprise Entities for integration, preferably based on an industry standard such as ACORD for Insurance, HR-XML for HR transactions, or one of the many XML industry standards, we force all integrations to map into and out of the standard.  This gives us a fixed set of metadata to work with, and over time as the individual applications develop new functionality and expose it, rather than expose in their proprietary format and spend time metadata mapping they begin exposing directly in the enterprise entity standard.  Further along the maturity cycle the enterprise standard begins to extend deeper into the applications as they begin to model their internal representations around the standard (avoiding even internal mapping).

Following this approach significantly reduces the metadata and mapping problem over the long term (5 years) and provides the short term benefits of speedier integrations (as the enterprise teams become comfortable working with the entities and can quickly utilize services exposed in that pattern without mapping).

Aug 17, 2010

Best of Breed vs. Suites




supply_chainThis is a classic IT question.  Should one go with picking and choosing Best of Breed applications in the various niches that one’s IT shop needs, or go with an Application Suite?  SOA and Integration has some significant input to this question, and impact from this question.  And the same question applies not only to business toolsets, but also to SOA toolsets (best of breed ESB, design time governance, run time governance, SOA security tools, BPM, etc, or a suite?)

With best of breed, we might end up with one company’s ERP system and another company’s CRM system, a third company’s manufacturing system and a fourth company’s financials.

With historical integration patterns, the issue of data interchange between the systems was mostly handling by large scale data exports and imports, usually performed as batch processes at end of day (or end of week or end of month).  These processes could be described as “dump all the (insert primary data type here, such as customers), then dump all the (day’s/week’s/month’s) transactions” followed by a specially written import program that read the foreign systems’ format and wrote the transactions (either directly into the database or processed through the exposed transaction API).

As these systems chained along the full business process, the latter systems in the chain might not be updated (assuming a daily batch) for 3 or 4 days.  Further, other systems that came along needing the same data wouldn’t necessarily go the the primary source (let’s say the first system in the chain), they would go to the easiest access point for the data (whichever system had the easiest API to use or easiest database to access) to pull it.

flow-diagram-lrg The result over time in these cases was an initial chain of connections that degraded into a web of batch extract, transfer, and loads hubbing off the original chain.

But today in our best of breed scenario there is an expectation of real time feeds between the systems.  The integrations are significantly harder as the systems have to be intricately connected, dealing with differences in data formats, connection protocols and paradigms, and transaction processing models. 

The integration effort in the Best of Breed scenario has gone up significantly, and the success or failure of the primary systems is completely dependent on a successful reliable intricate integration!

In the case of the Suite approach, each individual module (or major application) may not be the best in class solution.  Some parts of the suite may be excellent, others average and some just barely acceptable.  Yet, if the vendor has done their job well, the various modules are already well integrated.  Not having to build, manage and maintain an integration layer among the suite components is the key advantage of the suite.

Does that mean I recommend the suite approach over the best of breed approach?  Not necessarily.  There are good counter arguments such as avoiding vendor lock-in and being able to take advantage of particular applications that offer exceptional abilities in their area.

So how does one deal with a best of breed approach, whether total or partial (say you use SAP for many things but still use a few other solutions)?

The answer is integration processes and standards.  Building a well defined integration architecture layer that provides logical decoupling, as well as forcing your internal IT shop (and the vendors if possible) into XML industry standards is critical.

And what of SOA suites?  Mixing and matching the integration tools is just as challenging as the applications.  One can select, say, Software AG CentraSite for Design Time Governance / Services Catalog, IBM Datapower for security enforcement with SOA Software’s ServiceManager for runtime control and monitoring, and Oracle’s Fusion ESB.  Technically that should all be possible and should work.  On a practical basis, I don’t know of anyone who has succeeding in doing so.  (More often one finds most tools from one vendor and perhaps one component from another, and the IT shop dealing with the extra work of the particular bridge between that one integration point.)

I find it interesting that the proliferation of open standards and easy integration is driving us back to suites.  Not because we can’t connect everything but because it’s simply not worth the effort to do so.

Of course, those trying to do so is what’s keeping my work schedule completely full.  Hmm, maybe I shouldn’t have written this article.

Aug 9, 2010

Integration Spaghetti™



 

I’ve been using the term Integration Spaghetti™ for the past 9 years or so to describe what happens as systems connectivity increases and increases to the point of … unmanageability, indeterminate impact, or just generally a big mess.  A standard line of mine is “moving from spaghetti code to spaghetti connections is not an improvement”.

image

(A standard “point to point connection mess” slide, by enterprise architect Jerry Foster from 2001.)

In the past few days I’ve been meeting with a series of IT managers at a large customer and have come up with a revised definition for Integration Spaghetti™ :

Integration Spaghetti™ is when the connectivity to/from an application is so complex that everyone is afraid of touching it.  An application with such spaghetti becomes nearly impossible to replace.  Estimates of change impact to the application are frequently wrong by orders of magnitude.  Interruption in the integration functioning are always a major disaster – both in terms of the time and people required to resolve it and in the business impact of it.

Even as the spaghetti bound application is nearly impossible to replace, as the current state continues it continues to grow worse as additional connections are made to these key applications and derivative copies of the data are taken from it, or clones created to avoid it (and thereby creating another synchronization and connection point).

Such spaghetti takes multiple forms but often involves ALL forms with multiple generations of technology connections, including excessive point to point connections, tightly coupled connection technologies, database triggers, business logic embedded in EAI process steps, many batches in and out from and to many destinations, ETL loads and extracts to/from other databases, multiple services providing nearly (but not exactly) identical data sets, and the involvement of many message queues.

Anything is done to avoid dealing with the giant plate of spaghetti.

IntegrationSpaghetti

Systems will integrate with systems that integrate with it, piggybacking existing connectivity and putting a burden on the subsidiary system, to avoid directly connecting into the spaghetti.  They’ll go to a secondary or tertiary data source to avoid going direct.  Everyone knows avoid the spaghetti if at all possible and will spend double to triple the integration effort to do so.

If the primary system is replaced, it’s not unusual that the new system won’t be integrated into all the old connections – this would require actually understanding each existing connecting, extracting it and redirecting/reconnecting it to the new system – rather the OLD SYSTEM will stay around to act as the connection point for all the existing spaghetti connections and the new system will become an integration, taking data feeds or a regular ETL load, off the old system!  Meaning the old system lives forever!

Does this problem every get resolved?  Yes.  When the other side of the connections gets replaced, the new systems on that side will be integrated with what replaced the core spaghetti bound system.  If the IT shop is lucky after a generation or so the spaghetti bound system can be shut down.

Unfortunately in major Enterprise IT shops finding some spaghetti integrations is not unusual.  IT management is loathe to acknowledge such a problem to the business and will continue work-arounds until it directly impacts business goals.  Otherwise it remains just another hidden IT enterprise IT expense.

Aug 5, 2010

Signs of Industry Governance Failure and Recovery



 

gov A number of industry analysts have been speaking of SOA Design Time Governance failure for some time.  As I’ve written previously, primarily this was because the majority of enterprise IT shops hadn’t reached either the SOA maturity level to deal with it or had a large enough service catalog to have a need to address it with tools.

I’ve seen a lot of change in this in the past year, as many organizations are suddenly asking for help in defining requirements for SOA governance tools.

But what of the cutting edge IT shops, the early SOA adopters who ran into SOA governance needs years ago and started working with SOA Design Time Governance tools of earlier generations?  (I admit to being one of these, having led the purchase of a design time governance tool for my U.S. Fortune 50 employer at the time, about 7 years ago.)

Most of these projects FAILED!  (Including the one I ran.)  The tools were complicated and somewhat rigid, the processes to make it successful (the IT people-process changes necessary to successfully incorporate the tool) weren’t well understood.  As a result most design time governance projects morphed into simple service catalogs.  The complicated and expensive design time governance tools turned into a simple library website.  And sometimes not for services (as asset management tools with flexible templates, they easily serve the needs of other reusable IT assets – such as architecture designs). 

So today I find this interesting announcement in my inbox:

--------
SOA Software Announces Repository Federation Solution

Repository Manager supports downstream, horizontal, vertical, and custom federation scenarios

July 26, 2010 – SOA Software, (blah blah blah), today announced the availability of comprehensive SOA federation capabilities in its…product. These capabilities help customers share software development asset information effectively within and outside the enterprise, govern how this information is shared, and integrate this information (blah blah blah).

Downstream Federation:  (products) federation capabilities for leading service registries, including SOA Software’s Policy Manager, IBM WSRR, HP SOA Systinet, TIBCO ActiveMatrix and SAP ESR, automatically synchronize service definitions, supporting information and governance states both to and from the run-time environment.  This is unique in the market and provides customers with granular control of their various registries from a single platform…

--------

What strikes me about this?  Now we have a vendor who’s going to cross-load information from previous (failed) installations of design time governance tools.  I mean, why else would an enterprise have multiple design time governance tools (or even instances of the same tool)?  These tools, by nature, handle multiple libraries or catalogs of information. 

So multiples means previous failures have devolved into niche catalogs of select assets, become departmental tools or limited in use by a few major project teams or architect led projects.  (Given the price structure of these tools enterprises do not choose them as departmental solutions.  These are enterprise-wide priced tools.)

All this says to me that governance is making a comeback.  However, the key to any governance project is not the particular capabilities of the tools, but the processes and methodologies necessary to get a successful implementation and tool use acceptance.  Incorporating design time governance into the software development lifecycle is the key success factor.  How a vendor is going to help the customer to make that happen is the first question any customer needs to ask.

(Photo credit – SOA World Magazine)

Aug 1, 2010

Where does UDDI fit in the average Integration?




Addressing a service presents a few problems.  By putting the URL (or queue name if using messaging), you unintentionally couple between the service consumer and the physical instance of the service.  (Meaning what server it’s on, IP address, etc.)

Applications often unintentionally become tightly coupled simply by addressing connections directly, by IP address, server name, or queue name. This is unacceptable, as any change in the physical layer results in software changes. (Hardcoding such information is clearly a major mistake, but even placing it in a configuration file or database entry still results in application manipulation due to physical layer changes.)

Replace a server, redeploy all consumers of the services exposed on that server?  Ouch.  Even moving from development to test to production becomes a challenge (as you have to recompile or reconfigure as the consumer needs to repoint to the new provider instance in each environment).

UDDI was originally created as a runtime service lookup to decouple the logical use from the physical implementation.  Since service addressing must use outside facilities to allow adjusting addresses without touching the service, whether from the consumer/request or within the integration bus. UDDI fits the bill.

clip_image002

(Some IT shops try to get off easy by using just DNS to solve this problem.  This can be helpful in a single environment, such as the replacement of a production server.  But not for redirection between environments.)

However, some vendors have expanded basic UDDI systems into something much larger – basically expanding into Design Time Governance “registry and repository” capabilities…

clip_image004

Every organization using services should (but doesn’t have to) implement a UDDI.  The extended capabilities that have been loaded into many UDDI products often confuse the issue.

Example – IBM’s Websphere Registry and Repository started as a good production quality UDDI and was initially extended to handle MQ as well (handling queue addresses in addition to URL’s).  But then it was extended with a partial set of design time governance features, confusing the issue if I need a UDDI and have other design time governance tools.

In at least one conversation I’ve had with a vendor they had a hard time understanding what I meant by “runtime UDDI”, being focused on design time repository abilities.  UDDI’s become rather divorced from it’s original and primary functionality.

Blog Widget by LinkWithin

Search