5 Steps to Implementing a Modern Data Fabric Framework
If your organization wants a modern data management strategy, consider following the five fundamental steps to implementing a data fabric framework described here.
- By Steve Fuller
- July 18, 2022
The data fabric continues to be heralded as the modern data management solution for many organizations. The increasing number of data sources and data volumes, and multi- and hybrid cloud environments, combined with antiquated batch processes and transformation workflows, have made it nearly impossible for businesses to satisfy their need for real-time connected data. This has forced them to seek new approaches to leveraging these assets.
According to Forrester, data fabric is a hot, emerging market that delivers a unified, intelligent, integrated end-to-end platform to support new and emerging use cases. The sweet spot, according to Forrester analysts, is its ability to deliver use cases quickly by leveraging innovation in dynamic integration, distributed and multicloud architectures, graph engines, and distributed in-memory and persistent memory platforms.
A data fabric weaves together data from both internal and external sources and establishes a network of information that can power business applications, AI, and analytics. Put simply, it supports the full breadth of today’s complex, connected enterprise without the need to rip and replace previous investments. Rather than starting from scratch, organizations can transform their existing data infrastructure into a data fabric using a knowledge graph at the heart.
Knowledge graphs are an application of graph technology that turns data into machine-understandable knowledge, capturing the context that is often missing from traditional data management systems. Knowledge graphs provide dramatic code reductions while capturing real-world business meaning. They are also vastly different from the traditional relational data systems organizations are accustomed to using. Leveraging the interlinked descriptions of concepts, entities, relationships, and events generated by the knowledge graph, the data fabric can stitch together what an organization already possesses and enhance connected apps with enriched data from each source system.
Five Critical Steps to Creating a Data Fabric Framework
To truly enable collaborative, cross-functional initiatives using a data fabric, it is imperative to devise a well-thought-out plan. Organizations looking to embark on a data management strategy should begin by following these fundamental steps:
Step 1. Determine essential sources of metadata
To kick-off a data fabric initiative, begin by understanding the business questions you want to answer. Then identify the data sources you need to answer the question by identifying key sources of metadata. To fast-track this step, enlist the help of subject matter experts to help shape the business questions as well as to explain where the necessary data lives. You can accelerate your efforts by reusing earlier data governance programs and/or data catalogs. Finally, remember to request access to the needed data sources now; this will make connecting to these systems easier when needed later.
Step 2. Construct a superior data model
Once you have settled on your questions, the next step is to determine what entities are required to answer them. Put time and entity limits on the scope, such as spending no more than two weeks or limiting the scope to 5-10 different entities. If you find that you need more than 10 entities to understand and model a question, it is probably too big a question, so it may be best to go back to step one and refine your scope.
You can consider using publicly available, reusable data models such as the Financial Industry Business Ontology (FIBO), Brick, or others available on schema.org to further jumpstart this step. Don’t forget to leverage your data catalog (if you have one), because key business terms and how these terms are related to each other have probably already been established there.
Step 3. Unite data with your model
Of all the steps, connecting data to the model is the most important because this is where the metadata, model, and data itself are tied to downstream systems. It is at this point that data virtualization comes into play. By virtualizing data, organizations can accelerate time to value because they no longer have to concern themselves with extracting the data, reformatting it, loading it, and waiting for the jobs to complete. Rather, the business can look at the data where it already resides and leverage the investments made in existing data repositories.
Some systems are not well suited to being virtualized, particularly when it comes to performance or security concerns as those systems rely on materialization, which is when we store the data within the repository. That’s why the combination of virtualization and materialization capability is so important.
Step 4. Share the data with consumer applications
Now that your data has been united, it is time to consume it so people can answer their business questions using the data fabric. Your organizations can do this by applying a number of approaches, such as integrating a data fabric into apps you already have in place, leveraging business intelligence and analytics tools, or applying a graph visualization tool.
Step 5. Reiterate the process when new business demands arise
The final step is easy: simply repeat the process. One of the key benefits of a data fabric is that it can expand over time. This means replicating the procedure to answer new or related questions. By iterating through this process, your organization can make use of its previous work without having to begin the next one from scratch.
Leveraging Data Fabric to Modernize Data Integration
Digital transformation demands rapid insight from increasingly hybrid, varied, and changing data. Traditional data integration platforms were never designed to keep pace with today’s growing complexity, not to mention ever-changing business requirements and greater demand for curated data sets.
Data fabrics offer a more flexible solution, supporting dynamic delivery of semantically enriched data. By following the steps we’ve described, your organization will be on its way to transforming your existing data infrastructure into a modern and reusable data fabric.
About the Author
Steve Fuller leads Stardog’s Solutions Consulting and Engineering team. He previously worked with TigerGraph, where he led technical account management and planning, technical assessments, and needs analysis for their graph database product. In addition, Steve spent seven years at Cloudera, a provider of enterprise data cloud solutions where he led solutions engineering. Before Cloudera, Steve served in Solutions Architect roles at Hewlett Packard and Axciom.