Skip to main content

Data Gravity and Cloud Uplift Woes

When I originally learned about Data Gravity via David Linthicum’s excellent podcasts, a key architecture point that stuck in my mind was: your application needs to be close to its (interactive) data sources.  (Data Gravity has since picked up a wider yet less useful definition as more applications cluster together in a “data galaxy” to be close enough for large amounts of data to be able to interact.)

Why?  Every interaction with a database has communication (network time) overhead.  Most applications are built with their servers and database on the same LAN / subnet / vnet – usually in close physical proximity, specifically so that time is minimized.  Every hub/switch/router adds time to the request, so while a request may pay 10ms when it’s local, add time per “hop”.

Application performance and tolerances are implicitly built around that response overhead.  If data takes too long to return, the developers will likely adjust their code to do bigger queries, wider joins, etc.  But what’s “too long”?  If a user is waiting for an application response, “too long” is whatever the user tolerance is.  If it’s a background process, the tolerance is likely much higher as long as the process can complete within its time window the overhead may not even be noticed.

In the early discussions of Data Gravity, the questions were often around “can I move my expensive database cluster to the Public Cloud while leaving my application on premises?”  And the answer was “no – it would add too much communication overhead, damaging the applications’ response time.”

Years later we’re in a different position.  Enterprises are mass uplifting applications.  Some are up in the cloud, some left behind in the data center.  

And suddenly many applications are encountering data latency.  

When applications are interacting by direct database access to other applications, particularly interactive interactions (many queries as opposed to bulk queries), essentially they are tightly coupled.  And if one is in the cloud and another left behind in the data center, they’re subject to all the network overhead as well as any bandwidth limitations that may exist between the data center and cloud.  (Azure mitigates the bandwidth issue somewhat with “Express Route”, AWS with “Direct Connect.)

The bad news is that if the application has a frequent interaction pattern with the remote databases, there may be no good solution other than refactoring the application (and it’s often mature applications that may be in maintenance mode that are being uplifted – meaning it’s a problem) – particularly if the latency is causing unacceptable user impact.  If the problem is bandwidth, it will be time to invoke the network team to assure the traffic is following the optimum path, getting prioritized, and that sufficient bandwidth is available to meet the demand.

The summary is, before uplifting an application that uses direct database access to other applications/databases (whether code based, remote SQL, or something like SSIS), OR if the application is accessed via direct database access by other applications, check the interaction pattern to determine if additional latency is going to unacceptably impact performance.

[ There's another whole solution direction I'm avoiding that only applies to large enterprises, namely AWS Outposts and Azure Stack, bringing the edge of the Public Cloud into physical proximity with the enterprise data center by placing a rack of AWS/Azure equipment into the data center.  With close local routing, both bandwidth and latency issues are avoided.  At the time I'm writing this, AWS is creating mini-outposts, which may make it a viable option for smaller enterprises.  HOWEVER, this also assumes that organizations are going to be comfortable directly connecting AWS or Azure managed equipment (and networked back to them) into their data center. I can see the cyber-security team heads exploding from here. ]

Popular posts from this blog

Integration Spaghetti™

  I’ve been using the term Integration Spaghetti™ for the past 9 years or so to describe what happens as systems connectivity increases and increases to the point of … unmanageability, indeterminate impact, or just generally a big mess.  A standard line of mine is “moving from spaghetti code to spaghetti connections is not an improvement”. (A standard “point to point connection mess” slide, by enterprise architect Jerry Foster from 2001.) In the past few days I’ve been meeting with a series of IT managers at a large customer and have come up with a revised definition for Integration Spaghetti™ : Integration Spaghetti™ is when the connectivity to/from an application is so complex that everyone is afraid of touching it.  An application with such spaghetti becomes nearly impossible to replace.  Estimates of change impact to the application are frequently wrong by orders of magnitude.  Interruption in the integration functioning are always a major disast...

Solving Integration Chaos - Past Approaches

A U.S. Fortune 50's systems interconnect map for 1 division, "core systems only". Integration patterns began changing 15 years ago. Several early attempts were made to solve the increasing problem of the widening need for integration… Enterprise Java Beans (J2EE / EJB's) attempted to make independent callable codelets. Coupling was too tight, the technology too platform specific. Remote Method Invocation (Java / RMI) attempted to make anything independently callable, but again was too platform specific and a very tightly coupled protocol. Similarly on the Microsoft side, DCOM & COM+ attempted to make anything independently and remotely callable. However, as with RMI the approach was extremely platform and vendor specific, and very tightly coupled. MQ created a reliable independent messaging paradigm, but the cost and complexity of operation made it prohibitive for most projects and all but the largest of Enterprise IT shops which could devote a focused technology...

From Spaghetti Code to Spaghetti Connections

Twenty five years ago my boss handed me the primary billing program and described a series of new features needed. The program was about 4 years old and had been worked on by 5 different programmers. It had an original design model, but between all the modifications, bug fixes, patches and quick new features thrown in, the original design pattern was impossible to discern. Any pattern was impossible to discern. It had become, to quote what’s titled the most common architecture pattern of today, ‘a big ball of mud’. After studying the program for several days, I informed my boss the program was untouchable. The effort to make anything more than a minor adjustment carried such a risk, as the impact could only be guessed at, that it was easier and less risky to rewrite it from scratch. If they had considered the future impact, they never would have let a key program degenerate that way. They would have invested the extra effort to maintain it’s design, document it property, and consider ...