When I originally learned about Data Gravity via David Linthicum’s excellent podcasts, a key architecture point that stuck in my mind was: your application needs to be close to its (interactive) data sources. (Data Gravity has since picked up a wider yet less useful definition as more applications cluster together in a “data galaxy” to be close enough for large amounts of data to be able to interact.)
Why? Every interaction with a database has communication (network time) overhead. Most applications are built with their servers and database on the same LAN / subnet / vnet – usually in close physical proximity, specifically so that time is minimized. Every hub/switch/router adds time to the request, so while a request may pay 10ms when it’s local, add time per “hop”.
Application performance and tolerances are implicitly built around that response overhead. If data takes too long to return, the developers will likely adjust their code to do bigger queries, wider joins, etc. But what’s “too long”? If a user is waiting for an application response, “too long” is whatever the user tolerance is. If it’s a background process, the tolerance is likely much higher as long as the process can complete within its time window the overhead may not even be noticed.
In the early discussions of Data Gravity, the questions were often around “can I move my expensive database cluster to the Public Cloud while leaving my application on premises?” And the answer was “no – it would add too much communication overhead, damaging the applications’ response time.”
Years later we’re in a different position. Enterprises are mass uplifting applications. Some are up in the cloud, some left behind in the data center.
And suddenly many applications are encountering data latency.
When applications are interacting by direct database access to other applications, particularly interactive interactions (many queries as opposed to bulk queries), essentially they are tightly coupled. And if one is in the cloud and another left behind in the data center, they’re subject to all the network overhead as well as any bandwidth limitations that may exist between the data center and cloud. (Azure mitigates the bandwidth issue somewhat with “Express Route”, AWS with “Direct Connect.)
The bad news is that if the application has a frequent interaction pattern with the remote databases, there may be no good solution other than refactoring the application (and it’s often mature applications that may be in maintenance mode that are being uplifted – meaning it’s a problem) – particularly if the latency is causing unacceptable user impact. If the problem is bandwidth, it will be time to invoke the network team to assure the traffic is following the optimum path, getting prioritized, and that sufficient bandwidth is available to meet the demand.
The summary is, before uplifting an application that uses direct database access to other applications/databases (whether code based, remote SQL, or something like SSIS), OR if the application is accessed via direct database access by other applications, check the interaction pattern to determine if additional latency is going to unacceptably impact performance.
[ There's another whole solution direction I'm avoiding that only applies to large enterprises, namely AWS Outposts and Azure Stack, bringing the edge of the Public Cloud into physical proximity with the enterprise data center by placing a rack of AWS/Azure equipment into the data center. With close local routing, both bandwidth and latency issues are avoided. At the time I'm writing this, AWS is creating mini-outposts, which may make it a viable option for smaller enterprises. HOWEVER, this also assumes that organizations are going to be comfortable directly connecting AWS or Azure managed equipment (and networked back to them) into their data center. I can see the cyber-security team heads exploding from here. ]