Is distributed data realistic? † InfoWorld

The idea of ​​distributed data is an old concept that lived more in white papers and dissertations than in the real world. I remember talking about distributed data in my database design college in the late 80s, with the belief that it would probably come out the next year. It never did.

The idea has been consistent over the years: no matter where we store data, by using a common set of services or managing data management, we can handle anything, no matter where it physically exists, as one logical grouping. From the data. This data is available at any time, by anyone, for any purpose. It’s federated, democratized, and it’s completely transparent about how this magic happens across clouds, edge computers, devices, and legacy systems.

Fast forward to 2022, and we’re talking about the same concept as 40 years ago. What’s different is that we now have the ability to run it for a reasonable price. We also have emerging concepts such as cloud native, which we define as a common stack where the private and public clouds are the foundation, but the foundation clouds typically don’t deliver services (or data) directly to the applications or analytic tools.

A few things are driving this now.

First, we finally have a working and reliable global network; sure, hopefully that will be the case when 5G completes its rollout.

Second, there is interest in maintaining data on edge systems outside of the data center and cloud providers, i.e. any device or server that can store and process data.

Finally, data storage has been democratized. Data management and control is no longer the domain of a single data administrator, but a group of people who own specific datasets that are widespread and can be used as a single dataset or federated grouping of datasets, with no performance limitations or functionality.

Of course, a lot of coordination is needed to make data true everywhere. The biggest problem is having a functional management control plane that can keep track of the data and deal with governance and security. Simple things like changing the meaning of a data element on an edge device can cause hundreds of applications and embedded analytics processes to break if not managed properly. If devices or servers, cloud or otherwise, are offline for an extended period of time, that offline data will be missing for applications and analytics that depend on them until communications are restored.

You really have to use your head. Just because you can store and use data anywhere as if it were centralized doesn’t mean you should. There are some issues, such as network and management control plane failures that can cost you downtime. While we’re still figuring out the cost, it seems a bit more expensive to implement and use in the longer term than more traditional approaches and data centralization.

Despite all this, you should still be aware of distributed data. Indeed, it has many pragmatic applications that companies can use to drive innovation and growth. Improving the customer experience by, for example, bringing more control over the data to the customer’s systems is an opportunity; there are hundreds of others.

So take a look at distributed data or data anywhere in 2022. As always, look for pragmatic use cases to keep your business out of trouble.

Copyright © 2022 IDG Communications, Inc.

Leave a Comment