TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
- Webinars
  - Expert Panel Exploring Best Practices for Unified Data Management January 13, 2025
  - De-Risking Innovation: Safely Adopting GenAI January 14, 2025
  - Building Reliable Data and AI Systems January 15, 2025
  - Talking Business to Your Data: Conversational Analytics January 16, 2025
- Virtual Summits
  - Virtual Events TDWI Virtual Summit Series: Generative AI in Action: Lessons Learned from Successful Implementations December 9, 2024
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
Train
- In-Person Events
  - Conference TDWI Transform West - Las Vegas December 13, 2024
  - Executive Summit TDWI Modern Data Leader's Summit Las Vegas: Modern Data Foundations: Essential Strategies for AI Success December 20, 2024
- Virtual Live Seminars
  - Seminar Data Architecture Essentials: Building a Data Foundation for Enterprise Analytics November 26, 2024
  - Seminar Getting Started with AI in Your Organization November 26, 2024
  - Seminar Data Modeling Essentials November 26, 2024
  - Seminar ChatGPT 101 for Business Users November 26, 2024
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

Four Tips for Achieving Lasting ROI with a Data Lakehouse

Data lakehouses provide new data storage possibilities, but implementing them can be challenging. These four tips will help you achieve long-lasting ROI from this architecture.

By Michael Hay
July 10, 2024

For more than a decade, the data lake has been evolving, and the last few years have seen a logical progression of the architecture. Today, the data lakehouse positions modern data warehouse analytics, performance, security, and governance functionality directly onto the lake while still embracing data’s many formats, intake sources, and naturally distributed state.

For Further Reading:

Avoid Ending Up with a Marshy Mess Instead of a Data Lakehouse

Data Lakehouses: The Key to Unlocking the Value of Your Unstructured Data

Sunrise at the Lakehouse: Why the Future Looks Bright for the Data Lake’s Successor

The data lakehouse is popular because it opens new efficiencies and reduces the friction inherent in constant data movement. It can empower different enterprise teams to directly access data for business intelligence, streaming analytics, data science, machine learning, and product development, using their favorite query engines and tools and leveraging computational resources on premises, in the cloud, or a hybrid mix. The lakehouse enables an end-to-end experience where data is easily accessible across an organization as a reusable product. Teams can have a conversation with their data, query data sets in the lake across multiple file and table formats, and crack open those Apache Parquet or Iceberg instances to solve real problems, whether with a simple ad hoc request or a machine learning task.

That’s not to say implementing a lakehouse always goes swimmingly. Here are four tips to achieve lasting ROI with this promising architecture.

Think Deeply About Data as a Reusable Product

Implementing a lakehouse involves thinking deeply and differently. You still have to meet known challenges, such as considering where your data originates and why you're capturing it. If you’re ingesting data from emails, logs, IoT sensors, sales and marketing tools, and many other sources, you still need to prepare it, clean it, and make sure it’s properly anonymized, masked, and aligned with governance policies, compliance regulations, and laws. You’ll need workflows to analyze, transform, index, enrich, and search data so it’s readily usable by query engines or even retrieval-augmented generation (RAG) techniques, when considering generative AI.

Imagining data as a reusable product from the outset and ensuring it can be repurposed for new, as-yet-unknown tasks is also vital when implementing lakehouse architecture to advance long-term gains. Managing data as if it’s a product means gathering requirements and thinking about data within the context of an interactive and agile development life cycle, where you’re preparing products for people who come after you and use those products for creative applications that aren’t even on the drawing board yet. You might deploy an advanced catalog and inverted index that bolsters new data use and reuse.

A data lakehouse that enables the fast repurposing of data provides a key condition for effective self-service, too. As the lakehouse opens massive reservoirs of structured, semistructured, and unstructured enterprise data stores, business teams with different kinds of domain expertise can explore data widely and bring their good ideas to fruition to produce new value in ways never before possible with the bottlenecks and limited access of the past.

Turn a Data Tax Into a Data Asset

In one of the world’s largest economies, there’s a painful but essential banking regulation that requires all banks to store all logs for seven to 10 years. All logs. This is a multipetabyte-scale compliance challenge—a kind of data tax banks must pay to do business. They need to constrain costs and, from a regulation perspective, have the log data in an optimized, biased-for-action format (structured and formatted in a way that is optimized for query by a SQL query engine) that enables the bank to respond to internal auditors or external regulators quickly.

You can use a lakehouse to transform this scenario into a refreshing new opportunity.

Log data might comprise a list of messages and time sequences. There's a text payload there that can be mined for machine learning and generative AI applications. You might mine that data to look for advanced persistent threats or security problems. You might create a new analytics application. A data lakehouse makes all this possible in an efficient way. Perhaps the query engine the bank uses is the wrong tool to solve a new problem. Fortunately, you can bring something as simple as Python, or an analytical engine, or a tool to execute statistics and mathematics within the SQL query engine, to bear on your problem. They’re all easy to dock at the lakehouse. It enables the creation of derived data products and transforms the log data tax into a data asset.

For Further Reading:

Avoid Ending Up with a Marshy Mess Instead of a Data Lakehouse

Data Lakehouses: The Key to Unlocking the Value of Your Unstructured Data

Sunrise at the Lakehouse: Why the Future Looks Bright for the Data Lake’s Successor

Beyond this use case, the idea of time series and log data management holds exciting business and nonprofit usage scenarios. Consider government-oriented sectors trying to store logs efficiently for querying that data, building reports, and feeding those data sets into AI pipelines or some other advanced analytics format.

Refresh Stale Data and Leverage Specialized Processing

To facilitate database retirement and new efficiency gains, many businesses are taking a close look at their legacy data infrastructure and warehouses and moving old, stale database objects into data lakehouses. Leveraging the lakehouse’s open format keeps data from previous systems usable beyond the life of various applications while lowering costs. The lakehouse offers efficiencies when it comes to replications, backup windows, load times, and all kinds of concerns around transition to a new environment and responsible maintenance of data for the long term.

Lakehouse-enabled ad hoc queries offer another kind of advantage and a data mart-type experience. For example, consider a telecommunications company with detailed call records and signal strength records that wants to optimize placement of dozens of new cell towers. Such a project requires various advanced queries, but it’s a one-off. A lakehouse allows you to bring specialized processing to the data for the immediate need, with secondary use cases always possible down the line.

In fact, the ease with which teams can use different, specialized query engines on the lakehouse is a major feature. Hook up an engine that does Solr SQL, which is an unstructured data search engine that can also talk to databases, structured systems, NAS devices, and object stores. Get in the middle of an indexing pipeline and add customizations that help it “speak” oil and gas data or seismic data. Develop data products that combine data from the lake and from external sources across clouds and on-premises environments. The possibilities are limitless.

Use Durable Open Formats to Pass the Test of Time

Lakehouses embrace open formats, and they’re best for keeping data accessible throughout lengthy time frames. Whether an open format is as simple as CSV or as complex as an Iceberg table, keeping data in them means you’re much more likely to be able to read that data five years or a decade from now and still reuse it. Simpler, more durable open formats pass the test of time.

It's also more important than ever to think about the control you need. For a long time, organizations have had to outsource IT control and choice; they’ve paid big cloud bills that never manifested the promised efficiency. Open source tools and open data formats put you squarely back in the driver's seat, though this also means taking some responsibility for business outcomes.

Today data lives much, much longer than applications. It’s vital that organizations never obfuscate their data to make it incapable of yielding new value, which requires distinguishing between data and applications. Data living in open formats in the lakehouse, instead of bottled up in applications, makes it available and reusable for a multiplicity of consumers, including those launching or refining AI projects in years to come.

Lakehouses aren’t an automatic panacea, and the architecture requires thoughtful implementation. Managing data in a lakehouse environment with these tips top of mind can help your enterprise better dive into its data and achieve a lasting return on its infrastructure investments.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Four Tips for Achieving Lasting ROI with a Data Lakehouse

Related Articles

Trending Articles

What’s Ahead in Generative AI in 2025? (Part Two)

What’s Ahead in Generative AI in 2025? (Part One)

Curb Your Hallucination: Open Source Vector Search for AI

4 Practical Tips to Create Value with AI

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Four Tips for Achieving Lasting ROI with a Data Lakehouse

Related Articles

Trending Articles

What’s Ahead in Generative AI in 2025? (Part Two)

What’s Ahead in Generative AI in 2025? (Part One)

Curb Your Hallucination: Open Source Vector Search for AI

4 Practical Tips to Create Value with AI

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career