What Do We Mean When We Say: Data Fabric
I often refer to the innermost structures of things as being "fabric" — the word, in its root meaning, refers to what makes up something. Gartner recently referred to "data fabric" as a top technology trend for 2022, but what does this relatively recent coinage mean?
To start off with, "data fabric" is not a hot new buzzword in the world of data analytics. What makes it trendy is that the term has begun to catch on outside the limited sphere of its origin. "Data fabric" describes a definitive environment comprising a united ecosystem that contains of all the technologies involved in data analytics.
Data Fabric Defined
In essence, a data fabric is a setup that enables an organization to make better use of whatever data it has. It other words, it utilizes or "stitches together" data that exists in various different silos already. It takes data that was here and there in patches and, so to speak, knits it into a patchwork quilt.
More specifically, a data fabric facilitates self-service data consumption, embeds governance, and automates the data integration process. This way, an organization can optimize its data for faster, cleaner, and more accurate insights. It also helps reduce data inconsistency and compliance risk, and improves data quality.
Think of self-service access to all your data available to all your users via a single portal — a one-stop shop like analytics.yourcompanyname.com. Now add in that oversight of data quality and data facilitation are built right into the delivery. The QA is triggered as the data flows.
Not only that, but all of the pipelines into other systems, all of the integrations, are built into the design. Your tables are set up from the start to incorporate integration. Your underlying structure is built on a fabric. You can build this afterword, but it is harder, post design to put a data fabric in place.
Save Money, Share Intel
This methodology and design standard will definitely have an impact on the IT industry. It can reduce costs by utilizing the best and most accurate source of data. In a fabric design, it doesn't matter what source the data users query, or what source the data comes from, just that it's the most accurate and up to date. It only uses what is needed.
Data fabric can reduce risk by automating data quality processes. For companies and industries that need accurate data, the quality processes built into the design are crucial. Adopting a data fabric design can help with that. It can also accelerate delivery of insights with a single view of all relevant information across your enterprise, all of your storage locations, and even your particular industry.
Who wouldn't want an entire look into "real estate," for instance, even if you are just selling homes? So, to briefly recap: The term data fabric refers to the "physical topology of servers and other hardware components used to store and process data."
The idea is that you should be able to move data around between different components like physical servers, virtual machines, containers, and storage accounts, no matter its use and regardless of its source or endpoint. It's all just "data."
The New Way of Looking at Data
This is an idea that has been around for a while now but is still considered bleeding edge — new. Data fabric provides a way for various types of data like schema (metadata), context, archive data to be passed around the enterprise applications that use them. A single instance of any one type of data can exist in multiple places, but it is important that each instance points to the same physical resources and that the data types are the same.
A fabric uses a "source of truth" for all reference data, like the fabric of space starts with a string or atom. Data can be moved from one component to another as needed and sourced from different places, if needed, since the one true source should be the same across the ecosystem.
A data fabric is also used to reduce the amount of management required, and to provide a single point of control for managing resources and settings across multiple physical and virtual resources. A data fabric simply describes a comprehensive way to make everything consistently use the same truth and to have that truth pushed, pulled and extended through the ecosystem.
Try to imagine a large piece of hypothetical fabric (space) extended over a theoretical space/time that joins multiple data points across locations, including the cloud. You can have all kinds of structured and unstructured data, along with methods for accessing that data and analyzing it.
It's a unique way of looking at data but one that science has used to look at space for a while now. Unlike space or real fabric — like the stuff your jeans are made of — a data fabric does not have a fixed shape. It is scalable and has built-in fluidity that accounts for data processing, management, and storage. It can be accessed or shared by internal and external teams for a wide variety of enterprise analytical and operational use cases.
Coming to an Enterprise Near You
Watch for companies to adopt this methodology inherently. The adoption process will pick up speed as companies get more cost-effective and more agnostic with their storage system. As the world marches towards becoming a data-centric one, every company will need to have more cohesive and easier to use data.
This is where the scalable data fabric comes in and it will help to manage the collection, governance, distribution, and integration across the landscape of companies today. Sharing and use of data will solve problems like they never have before.
A single instance or data fabric of any type of data can exist in multiple places, but it is important that each instance points to the same resource — the one source of truth — and that the data types are the same. Data can be moved from one component to another as needed by the enterprise and governed and controlled by a strict set of laws.
Just like the physical laws of nature are governed. A data fabric is used to reduce the amount of data management required by your technology teams, and to provide a single point of control for managing resources and settings across multiple physical and virtual ecosystems. Look for a data fabric methodology to land a your company soon.