TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
- Webinars
  - Expert Panel Exploring Best Practices for Unified Data Management January 13, 2025
  - De-Risking Innovation: Safely Adopting GenAI January 14, 2025
  - Building Reliable Data and AI Systems January 15, 2025
  - Talking Business to Your Data: Conversational Analytics January 16, 2025
- Virtual Summits
  - Virtual Events TDWI Virtual Summit Series: Generative AI in Action: Lessons Learned from Successful Implementations December 9, 2024
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
Train
- In-Person Events
  - Conference TDWI Transform West - Las Vegas December 13, 2024
  - Executive Summit TDWI Modern Data Leader's Summit Las Vegas: Modern Data Foundations: Essential Strategies for AI Success December 20, 2024
- Virtual Live Seminars
  - Seminar Data Architecture Essentials: Building a Data Foundation for Enterprise Analytics November 26, 2024
  - Seminar Getting Started with AI in Your Organization November 26, 2024
  - Seminar Data Modeling Essentials November 26, 2024
  - Seminar ChatGPT 101 for Business Users November 26, 2024
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

Best Practices to Ensure Data Transformation Success

Prophecy’s founder and CEO Raj Bains explains how organizations can overcome the challenges to achieving enterprise-wide data transformation.

By Upside Staff
June 24, 2024

Upside: How does generative AI change how people can work with data? How does it change how they should work with data?

Raj Bains: Generative AI is fundamentally transforming how people work with data, especially in business intelligence and data transformation. Data transformation, traditionally time-consuming and resource-intensive, is set for a significant change as technologies that leverage intelligence about data and language models are providing unique approaches to transforming, organizing, and getting insights from data. The industry's current approach to data transformation is not measuring up. Simplistic tools that cater to many users lack power. Cloud data platforms are powerful but remain accessible only to expert data engineers. Both of these approaches fail to adequately address the problem.

For Further Reading:

How Generative AI and Data Management Can Augment Human Interaction with Data

How RAG Will Usher In the Next Generation of LLMs and Generative AI

Generative AI and Its Implications for Data and Analytics

The data transformation process, essentially a narrow form of programming that is rich in context, is well-suited for AI-powered copilots that cater to users at various skill levels and integrate with visual data pipelines. Without some guidance from AI-powered visual copilots, it’s a challenge for some organizations to know what to do next, such as what tables to join or how target columns should be computed from source columns.

It can also be daunting to develop the transformations themselves because of the necessary coding that often requires expertise in PySpark or Scala. It’s also difficult to make changes to the code because it requires an understanding of the code itself. (This could be alleviated if users could make changes within the visual interface.) Alarmingly, documentation, explanations, tests, and commit messages are often completed as an afterthought, if at all.

There are several concerns, such as hallucinations and bias, about using generative AI for production work. When it comes to data transformation, how do you know generative AI is doing an accurate job? Is the technology truly ready for prime time?

Generative AI is ready for production work, not as the primary developer, but as a copilot assisting the primary data user. When developing a visual pipeline for data transformation, the copilot suggests transformations that the user can inspect, showing the resulting data after each step.

The visual interface allows easy final edits to ensure accuracy, which enables the copilot to identify similar values computed elsewhere with subtle differences to prevent errors. It also generates documentation, explanations, and tests to verify that the output matches user intent and maintains data quality.

By combining a visual interface with generative AI, copilots will significantly boost productivity without sacrificing quality.

Data platforms use generative AI to simplify a number of processes, but organizations are still struggling to get data into the hands of those that need it most. Why?

Generative AI and large language models (LLMs) have initially appeared as tools that generate text or code from prompts based on publicly available data. This presents two challenges: products such as ChatGPT lack specific organizational context, and tools such as GitHub Copilot, which generate code from prompts, are only usable by expert coders.

We'll soon see products that are more intuitive, require fewer tedious prompts, and are better integrated into user-friendly interfaces. These tools will be deployed within organizations, learning their context and becoming more useful. Early technologies such as retrieval-augmented generation (RAG) are steps in this direction. Given a few months to mature, these products will start delivering real value.

What should organizations do to address the unique data transformation needs of different types of users?

Organizations need to enable different data users to be productive in their roles. At one end are data engineers who set up data platforms, processes, standards, and frameworks, and build central pipelines for large data sets, ensuring performance and cost efficiency. At the other end are business data users, including data analysts and data scientists, who focus on business problems. They need an easy-to-use visual interface that enables them to build daily pipelines independently of the data platform teams.

Where data and analytics are centrally important to all types of users, the key question is how will generative AI change how we all work with data and what is needed to ensure success? For decades, we've known that obtaining clean, high-quality, and timely data poses one of the greatest challenges for enterprises. This challenge is especially critical as enterprises seek to capitalize on the promise of AI.

Most business teams feel starved for data, while central data platform teams are overwhelmed and can only deliver a fraction of what is needed, consuming excessive resources in the process.

Copilots improve the accessibility and availability of data for technical and non-technical users throughout the enterprise, democratizing data and analytics while ensuring the delivery of clean, trusted, and timely data needed for analytics. They also help data users increase their productivity. The key is to meet the needs of all users on the same platform by allowing data platform teams to assist business users and ensure everyone follows best practices and frameworks. Future copilots will be ubiquitous, accommodating all skill levels without compromising the platform's power.

Well-known copilots are producing impressive results, improving productivity. How should copilots be used for data transformation?

Copilots are showing impressive results, especially in programming, as seen with GitHub Copilot's rapid adoption and productivity improvements. These benefits will extend to data transformation with specialized copilots as well.

Data transformation copilots must be:

Integrated and comprehensive. Copilots must work with existing data platforms and support the entire data transformation life cycle.
Intuitive and intelligent. They must provide a visual and integrated interface for data analysts and a code interface for data platform users. By appealing to both groups, generative AI should handle half the work of developing, deploying, and observing pipelines to boost productivity.
Open and extensible. These visual interfaces should produce Spark or SQL code and enable data engineers to create standards and frameworks for all users.

With these capabilities, copilots can help data analysts and visual ETL developers create data pipelines, with AI doing half the work, drastically improving productivity. Data platform teams can code standards and frameworks and make them available as visual components. Most important, a single platform for all users reduces costs, increases productivity, and ensures higher data quality.

[Editor’s note: Raj Bains is the founder and CEO of Prophecy, a data copilot company that enables businesses to accelerate AI and analytics by delivering data that is clean, trusted, and timely. Previously, Raj led project management of Apache Hive at Hortonworks through their IPO and headed product management and marketing for a NewSQL database startup. His engineering roles include developing a NewSQL database, building a CUDA compiler at NVIDIA as a founding engineer, and as a compiler engineer working on Microsoft Visual Studio.]

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Best Practices to Ensure Data Transformation Success

Related Articles

Trending Articles

What’s Ahead in Generative AI in 2025? (Part Two)

What’s Ahead in Generative AI in 2025? (Part One)

Curb Your Hallucination: Open Source Vector Search for AI

4 Practical Tips to Create Value with AI

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Best Practices to Ensure Data Transformation Success

Related Articles

Trending Articles

What’s Ahead in Generative AI in 2025? (Part Two)

What’s Ahead in Generative AI in 2025? (Part One)

Curb Your Hallucination: Open Source Vector Search for AI

4 Practical Tips to Create Value with AI

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career