TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
- Webinars
  - Expert Panel Exploring Best Practices for Unified Data Management January 13, 2025
  - De-Risking Innovation: Safely Adopting GenAI January 14, 2025
  - Building Reliable Data and AI Systems January 15, 2025
  - Talking Business to Your Data: Conversational Analytics January 16, 2025
- Virtual Summits
  - Virtual Events TDWI Virtual Summit Series: Generative AI in Action: Lessons Learned from Successful Implementations December 9, 2024
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
Train
- In-Person Events
  - Conference TDWI Transform West - Las Vegas December 13, 2024
  - Executive Summit TDWI Modern Data Leader's Summit Las Vegas: Modern Data Foundations: Essential Strategies for AI Success December 20, 2024
- Virtual Live Seminars
  - Seminar Data Architecture Essentials: Building a Data Foundation for Enterprise Analytics November 26, 2024
  - Seminar Getting Started with AI in Your Organization November 26, 2024
  - Seminar Data Modeling Essentials November 26, 2024
  - Seminar ChatGPT 101 for Business Users November 26, 2024
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

Taming Cloud Data Integration Complexity

These three steps can help your organization rein in its fractured facts and manage data across multiple systems.

By David Torre
April 20, 2017

As business applications continue their steady exodus to the cloud, enterprise data is following a precarious trajectory of erosion, one in which information becomes increasingly fragmented as the number of cloud systems grows. The need for cohesive, comprehensive cloud data integration is more prominent than ever.

From One Source to Many Systems

Enterprises must understand that today's data elements don't have only one "source of truth" and "system of reference," referring respectively to where an object was created and where it was copied.

In simpler times, systems were fewer in number and data was often authored in one system exclusively. Fast forward to today and such paradigms feel downright medieval. The days of one system owning an entire business object are long gone, and the holistic records of yesterday have given way to the fractured facts we see today.

Modern business objects are now subdivided into tiny pieces; dissimilar systems own and author micro subsets of the original logical data object. A common example is a logical employee record being cut up into fragments, in which:

The global HR system edits the employee's legal name and salary details
The company's cloud productivity suite assigns and maintains the user's email address and telephone number
Yet another cloud platform manages the employee's performance metrics

Eventually all these systems must come together so their record fragments can be joined and each system can accurately depict the holistic employee record at all times.

Whether we label today's SaaS adoption as the result of shadow IT, technology decentralization, or business empowerment, the result is unequivocally the same: cloud is king and we can only expect increased cloud adoption in the coming years. Data professionals must, therefore, embrace data disjointedness (and not just for business intelligence reporting). Cross-pollination of data fragments among the various members of the cloud ecosystem must occur in a seamless, timely fashion if an enterprise is to maintain operational data integrity.

Modern-Era Challenges

Stitching together subsets of a business object across a motley patchwork of SaaS solutions brings modern challenges to the table, one of which is timeliness. Unlike the batch processes of yesterday, operational cloud data flows often occur in real time. If HR terminates an employee, that new representation of the employee needs to be updated in all relevant systems as soon as possible.

Another challenge relates to the use of third-party APIs. Gone are the days of unfettered database access to on-premises systems. Data now remains at arm's length, reachable only via APIs that operate at varying levels of maturity.

Three Building Blocks for Success

These challenges may sound daunting, but the good news is that many aspects of cloud data movement are within our control. With the right building blocks, modern solutions can quickly extract, manipulate, and disseminate data with surgical precision. Although there are many options for addressing cloud integration complexity, I find myself embracing three architecture building blocks each time I tackle this problem: canonical data models, building real-time data flows, and employing intermediate data stores.

Building Block #1: Canonical Data Models

The topic of canonical data models is well documented, yet melding data objects from dissimilar cloud systems requires an overarching nomenclature -- a vernacular that references objects by their business purpose instead of their system of origin.

For instance, if we want to analyze employee retention, we naturally yearn to examine a universal employee object. We don't want a Frankenstein-esque view of multiple HR systems, especially one hastily cobbled together. Essentially the canonical data model provides a business-friendly interface for what would otherwise be a confusing alphabet soup of system-specific data fields.

Building Block #2: Wrangling Real-Time Data

Designing for real-time data requires a shift in how we model data flows. Systems no longer sit around waiting for an extract-transform-load (ETL) engine to shuffle data to and fro. Instead, systems take the initiative by pushing data to APIs, typically as a result of triggers or events.

After retrofitting systems with the ability to push or pull data from API endpoints at will, hiccups inevitably occur. Endpoints go down for a variety of reasons and real-time events (such as an employee termination) can be lost due to transient technical issues. Whether you implement a store-and-forward queue or something else entirely, a "retry capability" is a must-have.

Building Block #3: Intermediate Data Stores

Like information-age surgeons, we're expected to reassemble data by piecing together various extremities into a unified whole. Real-time APIs represent our emergency room, and canonical data models, our sutures that stitch everything together. Another integral component in the data doctor's repertoire is, of course, the operating table: the area where we stage constituent parts and perform surgical operations. This is where intermediate data stores come into play.

Braiding two or more streams of data into a new, composite flow may be performed entirely in memory, but that's difficult at best, especially when intermediate steps (such as aggregations) must occur. I find that intermediate stores, such as traditional operational data stores (ODS), can kill three birds with one stone.

First, you have the convenience of staging data within a nonvolatile container that likely utilizes a familiar database access paradigm. Second, the ODS can serve double duty as an operational reporting data reservoir. Finally, the ODS can be used as a staging area or "launch pad" into the traditionally subject-oriented and historically focused enterprise data warehouse, assuming one is employed within your organization.

Summary

Unlike surgery, cloud integration need not be painful. Understanding the challenges of cloud integration is the first step in mapping out the process, and leveraging these three building blocks can smooth out your learning curve.

For Further Reading:

About the Author

David Torre is the owner of Center Mast. With nearly 20 years of experience and advanced degrees in information systems and business intelligence, David's unique combination of skills has enabled him to deliver cybersecurity and business intelligence solutions to some of most well-known companies in the world. You can contact the author at dtorre@centermast.com.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Taming Cloud Data Integration Complexity

Related Articles

Trending Articles

What’s Ahead in Generative AI in 2025? (Part Two)

What’s Ahead in Generative AI in 2025? (Part One)

Curb Your Hallucination: Open Source Vector Search for AI

4 Practical Tips to Create Value with AI

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Taming Cloud Data Integration Complexity

Related Articles

Trending Articles

What’s Ahead in Generative AI in 2025? (Part Two)

What’s Ahead in Generative AI in 2025? (Part One)

Curb Your Hallucination: Open Source Vector Search for AI

4 Practical Tips to Create Value with AI

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career