When I originally learned about Data Gravity via David Linthicum’s excellent podcasts, a key architecture point that stuck in my mind was: your application needs to be close to its (interactive) data sources. (Data Gravity has since picked up a wider yet less useful definition as more applications cluster together in a “data galaxy” to be close enough for large amounts of data to be able to interact.) Why? Every interaction with a database has communication (network time) overhead. Most applications are built with their servers and database on the same LAN / subnet / vnet – usually in close physical proximity, specifically so that time is minimized. Every hub/switch/router adds time to the request, so while a request may pay 10ms when it’s local, add time per “hop”. Application performance and tolerances are implicitly built around that response overhead. If data takes too long to return, the developers will likely adjust their code to do bigger queries, wider joins, etc. But wha
Serverless is great, but serverless functions rarely run by themselves... they're usually connected to data sources - and frequently that means SQL databases. In the graphic attached, my cloud SQL database is collapsing under assault by a serverless function that's scaling up instances to meet demand. Here's how we unintentionally created this problem: To allow for parallel processing to speed processing, we've taken a transactional billing file, broken it into groups of 30 transactions, loaded those into messages and queued them. We then have a serverless function that's listening for these events, and if messages are still waiting after a certain amount of time, another instance is spawned to begin handling the waiting messages and processing. When it was sent thousands of transactions, this worked great and the business users were suitably impressed with the processing speed (which dropped from many hours to a few minutes). When they began sending larger tra