Platform Architecture and Data Architecture Are Different but Related
Many people (even experienced data pros) confuse related architectures, weakening data-driven solution designs and their business use cases.
- By Philip Russom
- September 4, 2020
As a strategy for capturing and leveraging new data assets for new business and technology practices, many user organizations are diversifying their portfolios of data management tools and storage platforms. Their assumption is that no single data storage platform can be optimized for the extreme diversity of data structure, latency, operational purposes, and analytics applications that we face today. Instead, many organizations prefer to have a diverse portfolio so they can select the most appropriate tool and platform combination for a given type of data and its use cases.
This multiplatform strategy for data management drives organizations toward environments consisting of numerous data platforms where data is physically distributed across multiple database servers, file systems, and storage. Extreme complexity results from the number of systems involved, regularly encompassing multiple brands of database management systems, NoSQL platforms (especially Hadoop), and tools for data integration, analytics, streams, and in-memory processing. These may be on premises, in the cloud, or in hybrid combinations of the two. Tools and platforms may originate from software vendors, the open source community, homegrown development, consultants, or all of the above.
Defining Multiplatform Data Architectures
When data platforms and data sets are integrated this way, the result is a multiplatform data architecture (MDA). An MDA is an eclectic mix of old and new data, managed on traditional and modern data platforms, whether on premises or in the cloud, with diverse tool types from many providers, stitched together by some form of distributed data architecture. An MDA is characterized by its large number and diversity of data persistence platforms, as well as its broad range of data structures, types, and containers. Equally important, however, is the MDA's substantial data management infrastructure, which unifies the MDA's architecture by integrating, synchronizing, cleansing, mastering, and documenting data across the MDA's many platforms and beyond.
An MDA is a platform architecture that needs a complementary data architecture.
We assume that data is heavily distributed in an MDA. In other words, data is strewn physically across the many databases, clouds, file systems, and other storage platforms of the MDA. However, we also assume there should be some form of large-scale, cross-platform architecture that unifies the MDA and its data on a logical level. Ideally, a cross-platform data architecture should be actively designed by data architects and guided by some form of governance. Without such direction and control, an MDA can deteriorate into an unmanageable and ungoverned swamp that delivers minimal business value at a high risk of noncompliance.
Data architecture is about the data and how data is described via semantics.
Wikipedia's definition is a serviceable starting point:
Data architecture is composed of models, policies, rules, or standards that govern which data is collected and how it is stored, arranged, integrated, and put to use in data systems and in organizations. Data is usually one of several architecture domains that form the pillars [or layers] of an enterprise architecture or solution architecture.
At the risk of stating the obvious, data architecture is about the data. Be careful: this idea is too often forgotten when working with MDAs because the systems architecture is where a lot of the "action" is today as users actively deploy new platforms and replace old ones. When working with an MDA, don't lose sight of the trees for the forest. Platforms are great, but meaningless without good data.
Data modeling is local. Data architecture is global.
The Wikipedia definition begins "data architecture is composed of models." Yet, even experienced users confuse data architecture and data models. For example, when you see "data architect" on someone's business card, ask them what they do. Half the time they will describe data modeling, which is largely about local data structures and their components (rows, columns, tables, keys, data types), typically one database or table at a time. Data architecture tends to be about relationships across multiple data sets and their platforms. To enhance the relationships and the portability of data across platforms, data architects may design data transfer models and create governance policies for standard ways of modeling data. Their jobs are still largely about the big picture, which makes them indispensable for unified MDAs.
Data architecture is separate from -- but related to -- the systems architecture of platforms.
The Wikipedia definition also states that "data is usually one of several architecture domains." For example, we talk of technology stacks that have multiple layers. Data-driven technology stacks will have layers for the data's physical location, a DBMS or equivalent platform for managing physical data, a server (or cluster, rack, cloud) for the DBMS to run on, and other servers for data storage. There must also be a semantic layer (technical metadata) describing the physical data for purposes of query, access, transactions, and documentation. Other semantic layers can be built atop the main semantics for custom views of the data (business metadata, business glossaries and data catalogs, and data virtualization in general).
Each of these layers can have its own design patterns and parameters (i.e., architecture), yet they all work together in a larger technology stack. Furthermore, all are relevant to the MDA. For example, the systems architecture gets press attention because of the innovation and new products appearing there. The systems architecture is populated mostly by software servers, hardware servers, and modern equivalents (clouds, serverless computing, virtualization). To keep this from being a mere portfolio (which is just an inventory), the systems architecture needs a proactive design that accommodates the requirements of other layers (especially those for data and semantics) and enables cross-platform communications (for large-scale architecture).
Stacks and layers aside, we also talk about technology pillars (as mentioned in the Wikipedia definition). This metaphor is also a useful descriptor of the MDA because each platform of an MDA is like a pillar that stands side by side with others, although each pillar (or platform) can have its own technology stack with layers.
There is architecture in and across every stack, layer, pillar, platform, and data set. Overlap is inevitable -- and good.
As a final metaphor, note that architectures and layers can overlap. For example, the semantic layer of any multiplatform environment should present a user-defined architectural plan that spans across and overlaps the individual platforms. As another example, savvy data management teams will optimize their data warehouse and data integration solutions by giving each an appropriate architecture; yet the two architectures have so much overlap that it's hard to tell where one stops and the other begins.
In closing, here are a few more ideas to mull over:
- Without architecture, you have a portfolio of silos. Furthermore, having several architectures that don't integrate and interoperate well is not much better. You must design architecture at multiple levels, typically for platforms, data, and integration.
- Architecture is more than local data modeling. It is also about global relations among multiple, distributed data structures, data sets, databases, and data platforms.
- Data architects are indispensable. They own an MDA's big picture across interdependent platforms. They design architectural structures, and they govern architectural standards.
- Design and impose data architecture for the benefits. Well-formed platform and data architectures make a complex data ecosystem easier to understand, which leads to better designs, more performant optimization, and less taxing administration. Data architecture creates a uniformity that is conducive to governance, data standards, and the sharing of data across single architectures and whole enterprises. Most important, well-formed platform and data architectures make data more accessible for business leverage and innovation.
For Further Learning
Much of this article is drawn from the 2018 TDWI Best Practices Report: Multiplatform Data Architectures.