TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
- Webinars
  - Expert Panel Exploring Best Practices for Unified Data Management January 13, 2025
  - De-Risking Innovation: Safely Adopting GenAI January 14, 2025
  - Building Reliable Data and AI Systems January 15, 2025
  - Talking Business to Your Data: Conversational Analytics January 16, 2025
- Virtual Summits
  - Virtual Events TDWI Virtual Summit Series: Generative AI in Action: Lessons Learned from Successful Implementations December 9, 2024
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
Train
- In-Person Events
  - Conference TDWI Transform West - Las Vegas December 13, 2024
  - Executive Summit TDWI Modern Data Leader's Summit Las Vegas: Modern Data Foundations: Essential Strategies for AI Success December 20, 2024
- Virtual Live Seminars
  - Seminar Data Architecture Essentials: Building a Data Foundation for Enterprise Analytics November 26, 2024
  - Seminar Getting Started with AI in Your Organization November 26, 2024
  - Seminar Data Modeling Essentials November 26, 2024
  - Seminar ChatGPT 101 for Business Users November 26, 2024
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

The Shortcomings of Predictive Analytics

Data scientist Claudia Perlich explains why we must use machine learning and predictive technologies ethically, responsibly, and mindfully.

By Steve Swoyer
March 8, 2017

Do data scientists need a refresher course in the Hippocratic precept "first, do no harm"?

This is a question that data scientist Claudia Perlich has spent considerable time grappling with.

Predictive Models Have Unintended Side Effects

Perlich, chief data scientist with marketing analytics specialist Dstillery, believes data science and advanced analytics are powerful tools for human good, and she'll be making this case at TDWI's upcoming Accelerate conference, in Boston April 3-5. Accelerate will feature tutorials and presentations by Perlich and other industry luminaries.

In all likelihood, few sessions will be as provocative as Perlich's. Even though she's an unapologetic champion of predictive analytics, Perlich recognizes that machine learning and other technologies can only be forces for good if people use them ethically, responsibly, and mindfully.

For Further Reading:

3 Flavors of Predictive Analytics Automation

Taking Advantage of Predictive Models

AI in the Crosshairs

"I'm a huge fan of this technology. I love what I do and I've been doing it for almost 20 years. In that time, I've collected a deep understanding of why things don't work, often for very surprising reasons that have nothing to do with classical reasons," she explains. "I'm really interested in when and why things fail. 'Failing' isn't [the right word]. I'm talking about 'unintended side effects' -- [things] you didn't really count on when you decided to build models and put them out there in the wild."

First, Perlich says, we have to recognize that predictive models embody the acknowledged and unacknowledged biases of the people who created them.

"If you use a machine learning system to automatically screen job candidates ... your predictive model may propagate historical biases. If a model makes predictions [based on] what has happened in the past, it is bounded by [the selection criteria of] the past," Perlich says. "All of us who are enthusiastically building these models need to develop a moral sense of responsibility ... about how and when they are put to use."

Models Give You Exactly What You Ask For

This "moral" sense isn't just limited to scrubbing biases out of models. In some cases, a predictive model is optimized to predict the letter but not the spirit of what the modeler desires.

"I have seen the exact analogous effect in advertising ... when we talk about models that predict who will click on ads and we try to select those opportunities with the highest probability [of click-through]. You're trying to find the people most interested in the product -- people who will actually buy the product," she explains.

"This ignores the fact that people tend to accidentally click on ads. A person has eyesight problems; a person has lent their device to their three year old; a person is distracted. If you base your model on all [click-through data], you're going to ... end up with something that is technically correct but doesn't actually do what you want it to do."

Data scientists don't just have a responsibility to the strict letter of a requirement -- e.g., predicting successful job applicants or click-through opportunities -- but to the spirit of what they're trying to model and measure, she argues.

"The model is doing its job. It will find you a set of opportunities with the highest click-through rate. The applicant recommended by the [candidate-screening] model will be highly likely to succeed. [However,] you are stuck with this incompatibility where you're saying you want one thing and your model is giving you something else entirely," Perlich says.

"The discrepancy between the two objectives will increase as you are more able to do [the one thing] really, really well, [be it identifying] higher click-through rates or successful job applicants."

Designing Better Models

When you're designing predictive models, there are a couple of things to be alert to, Perlich says.

"You should never have any single technical criteria -- you should never focus just on click-through rates, for example. You should never try to do too much with your [individual] models. It's hard to build models that are optimized [for] many things at the same time," she observes.

"If your model is getting too good, it's almost always a problem. There was an example where we built a really good model that predicted breast cancer -- except it didn't. The only thing it had basically learned was ... that people in a [breast cancer] treatment center are more likely to have cancer than people in a breast cancer screening center."

Perlich sees the zero-sum character of societal debate about data mining and data science as a distraction. "The criticism being brought forward against data mining and data science is, in principle, often correct, but at the same time the antagonism between the critics of data science and its actual practitioners is exaggerated and nonproductive," she points out. "We're being told from a privacy point of view that everything we do is evil. What we need to ... collaborate on are better options to do these things the right way."

Because of its power, predictive technology will be used. It's inevitable. The challenge is to promote ethical and responsible usage.

About the Author

Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at evets@alwaysbedisrupting.com.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

The Shortcomings of Predictive Analytics

Related Articles

Trending Articles

What’s Ahead in Generative AI in 2025? (Part Two)

What’s Ahead in Generative AI in 2025? (Part One)

Curb Your Hallucination: Open Source Vector Search for AI

4 Practical Tips to Create Value with AI

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

The Shortcomings of Predictive Analytics

Related Articles

Trending Articles

What’s Ahead in Generative AI in 2025? (Part Two)

What’s Ahead in Generative AI in 2025? (Part One)

Curb Your Hallucination: Open Source Vector Search for AI

4 Practical Tips to Create Value with AI

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career