TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
- Webinars
  - Expert Panel Exploring Best Practices for Unified Data Management January 13, 2025
  - De-Risking Innovation: Safely Adopting GenAI January 14, 2025
  - Building Reliable Data and AI Systems January 15, 2025
  - Talking Business to Your Data: Conversational Analytics January 16, 2025
- Virtual Summits
  - Virtual Events TDWI Virtual Summit Series: Generative AI in Action: Lessons Learned from Successful Implementations December 9, 2024
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
Train
- In-Person Events
  - Conference TDWI Transform West - Las Vegas December 13, 2024
  - Executive Summit TDWI Modern Data Leader's Summit Las Vegas: Modern Data Foundations: Essential Strategies for AI Success December 20, 2024
- Virtual Live Seminars
  - Seminar Data Architecture Essentials: Building a Data Foundation for Enterprise Analytics November 26, 2024
  - Seminar Getting Started with AI in Your Organization November 26, 2024
  - Seminar Data Modeling Essentials November 26, 2024
  - Seminar ChatGPT 101 for Business Users November 26, 2024
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

Next Year in Data Analytics: Data Quality, AI Advances, Improved Self-Service

Three interesting analytics trends to watch that could improve the use of data throughout your enterprise.

By Mike Loukides
December 9, 2022

Back in 2005, Tim O’Reilly said, “Data is the next Intel Inside.” He was right. We no longer talk about “Intel Inside,” and the Intel Inside stickers have all disappeared. However, we do talk about data -- a lot. Here are three trends in data and data analysis that will be worth watching in the coming year.

For Further Reading:

Q&A: The Fundamentals of Data Quality

The Changing Value of Data and the Need for Data Literacy

3 Priorities for Your Next AI Initiative

Trend #1: Being data-centric

A few years ago, we said “more data trumps better algorithms.” Now, we’re finding out that better data trumps more data. Data quality is an important component of technical debt. More data can introduce problems rather than solve them. Simply increasing the size of a data set can lead to unexpected behaviors, some of which may be correct, many of which are wrong, and none of which are predictable beforehand.

Getting better data for training AI applications means paying more attention to tagging training data. That can mean many things, including tagging the first thousand or so items yourself so you understand the problems in the data set and can do a better job of instructing others how to tag the data. It might mean hiring a small number of people to do tagging and paying them a living wage. Many AI developers use crowdsourcing services to tag data, but they incentivize workers to move fast and ignore quality. Becoming data-centric might even mean using AI-based tagging solutions, which when done correctly can be less error-prone than human tagging.

Being data-centric also means ensuring that developers know where their data comes from, know that it was collected ethically, and understand possible sources of error. Finally, being data-centric means charging developers with creating documentation for data sets, along the lines of Timnit Gebru’s groundbreaking paper, Datasheets for Datasets.

Whether you’re performing business analytics, making customer recommendations, or managing supply chains, becoming data-centric will make your results more accurate. That’s true whether you’re using machine learning or more traditional statistical applications. We’ve said “garbage in, garbage out” for years. It’s time we took that seriously.

Trend #2: Paying attention

Everyone in technology must be aware of the tremendous advances made in natural language processing over the last two years. NLP isn’t directly related to data analytics, but I want to go out on a limb and suggest an important direction for future work. Models such as GPT-3 deliver great results because of the use of transformers, a new kind of AI algorithm. Without going into technical detail, transformers implement a kind of “attention” called self-attention: they are able to determine what parts of a text are important based on context, not just word frequency.

Although transformers aren’t widely used outside of NLP (they are starting to appear in computer vision, with encouraging results), I wonder what implications attention might have for business modeling and forecasting. We are increasingly relying on AI to build financial models. How powerful would those models be if they had a concept of “attention” -- if they could decide, based on context, what data was worth paying attention to? What if the input to the model consisted of all the company’s financial data, local and worldwide economic data, historical data, and current news, along with possible scenarios for going forward? Such a system, equipped with the ability to determine what is important and what is noise, would be able to out-perform our current financial models.

Will self-attention be applied to financial analysis, supply chain predictions, inventory management, and other business problems? This is definitely a risky prediction. Academic research tends to focus on problems such as natural language and computer vision, not business problems, but it is hard to believe that businesses wanting a competitive advantage will ignore the success of transformers in the academic world. It is equally hard to believe that existing SaaS platforms won’t see this as an opportunity to extend their product offerings.

Trend #3: Self-service data

Despite a lot of talk, self-service data is still in the early days. A few things are holding it back. First, data is still often held in silos: different repositories owned by different constituencies within a company, designed without any thought for compatibility or even use by other parts of the organization. At worst, silos are tied up with an organization’s political infighting. Nothing about data silos is conducive to self-service data. At the other extreme, some organizations have broken down their silos, replacing them with a data lake (or lakehouse, or warehouse, or some other house-y or watery metaphor). No more silos, but you still don’t have self-service data -- you have a lot of undigested data that is often unstructured, dumped into a mass storage system with minimal thought about how people are going to use it.

Data meshes are part of the solution to this problem. Data meshes allow groups within an organization to be responsible for their own data -- after all, they understand the data -- while making it available to the rest of the organization. Another key part of self-service data is a data catalog: a company-wide directory that lets users discover what data exists and shows what it has been used for, who is responsible for it, and other metadata. Good data governance also makes self-service easier because it forces you to document the data you have: where it came from (provenance), how it was collected, restrictions on its use, and other metadata. Good governance also entails taking responsibility for what users do with data. Self-service users can’t take a Wild West approach where anything goes.

One element is still missing: self-service data requires widespread data literacy. The shortages of data scientists, data engineers, and AI experts are real issues. Democratizing data also means that the people using data must understand how to use data properly. We have low-code and no-code tools that do a good job of building simple applications and doing basic analytics. However, the people who use these tools must be able to answer basic questions about when data is meaningful, how confident they are in the results, and whether the original data was gathered in a way that didn’t introduce errors and biases. There’s no shortcut around data literacy.

Those are O’Reilly Media’s three trends to watch. One may be a long shot -- but an important long shot. If you’re right about all your predictions, you’re not predicting. The longshots are the most interesting -- and if they win, the most important.

About the Author

Mike Loukides is vice president of content strategy for O'Reilly Media, Inc. He's edited many highly regarded books on technical subjects that don't involve Windows programming. He's particularly interested in programming languages, Unix, and what passes for Unix these days, AI, and system and network administration. Mike is the author of System Performance Tuning and a coauthor of Unix Power Tools and Ethics and Data Science. Most recently he's been writing about data and artificial intelligence, ethics, and the future of programming. Mike can be reached on Twitter and LinkedIn.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Next Year in Data Analytics: Data Quality, AI Advances, Improved Self-Service

Related Articles

Trending Articles

What’s Ahead in Generative AI in 2025? (Part Two)

What’s Ahead in Generative AI in 2025? (Part One)

Curb Your Hallucination: Open Source Vector Search for AI

4 Practical Tips to Create Value with AI

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Next Year in Data Analytics: Data Quality, AI Advances, Improved Self-Service

Related Articles

Trending Articles

What’s Ahead in Generative AI in 2025? (Part Two)

What’s Ahead in Generative AI in 2025? (Part One)

Curb Your Hallucination: Open Source Vector Search for AI

4 Practical Tips to Create Value with AI

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career