Two Tips for Citizen Data Scientists
Have analytics tools made it too easy for business users to think they can perform the role of a data scientist without the rigorous training needed to do the job right?
- By Fern Halper
- June 13, 2016
I've been thinking a lot about the notion of the citizen data scientist -- the business user or analyst who may not have formal training in statistics or math but who performs advanced analytics anyway using some of the easy-to-use, data-science-in-a-box software that is currently being marketed by analytics vendors. In a recent TDWI poll of 125 attendees at one of our educational events, 22 percent of respondents said business analysts were already doing this in their organization; 29 percent said they would be in the near future.
Is a citizen data scientist a good thing or a bad thing? The terminology aside (I'm not a huge fan of the term citizen data scientist), I have to admit that I am conflicted. The data scientist in me is concerned that these people will make potentially costly mistakes because they are not trained -- in the techniques, in interpreting output, and in thinking critically about data. There is a gap between a data scientist and a business user or super spreadsheet user that cannot just magically disappear because of a piece of software.
Of course, the hype is always ahead of the market. Some argue that self-service BI is paving the way to the citizen data scientist and that this evolution is inevitable. However, our research shows that self-service BI has not yet widely penetrated organizations. On average, where self-service data discovery exists, it has penetrated no more than 20 percent of an organization -- at least in our audience base.
Likewise, our surveys tell us that more than half of self-service users need help from IT. We have not yet achieved the vision of democratized BI, much less the vision of moving to more sophisticated self-service analytics. TDWI research shows that the majority of organizations with analytics programs we surveyed are only now starting to move past spreadsheets and dashboards to analytics such as predictive analytics.
What does this mean? Should business users and analysts be making use of these tools? First things first: I've been involved with advanced analytics for a long time. I think it is important. I think it provides value. My research has illustrated the value of advanced analytics. Should more organizations be performing analytics and advanced analytics and doing it well? Absolutely.
Here are two pieces of advice I can offer about tools for the citizen data scientist:
Tip #1: Use the tools, but get trained on them
Years ago, when I was working in a Center of Excellence at Bell Labs analyzing data for AT&T, I was fortunate enough to be able to use some of the newest machine-learning and neural-network algorithms coming out of Bell Labs research. However, even though I had a Ph.D. in a quantitative field that required statistical expertise, I didn't think it was responsible of me to use these tools without understanding them. I made sure to read papers on the methodologies and to attend training classes that explained these techniques (most were about 3-4 days long). In this way, I was an informed consumer.
That was back when the tools weren't easy to use. However, the same principles apply here. If business analysts want to use these tools, even if they are easy to use, they should understand them -- at least at some level. Get some training. There are data science boot camps available. TDWI has one to help people get started.
Business users (potentially less analytically inclined) need to get training, too. Get trained in thinking about data as well as in more sophisticated tools and techniques. Get started on self-service BI before thinking about analytics that are more sophisticated than that. Become consumers of this kind of data and analytics, too!
Tip #2: Make sure there are controls in place if these models go into production
It is one thing for a "citizen" data scientist to explore data using predictive and advanced models that are built for them by software. It is another thing entirely to put these models into production or use them to make important decisions. The models need to be vetted. I've talked to many organizations that will let business analysts build a model but won't let them put it into production without it going through some sort of control process. The control might be that a data scientist checks it out and says it is okay. That is a good start.
A Job Well Done
Look, just because something is easy to do, doesn't mean that it is easy to do well. I would argue that it is important to do data science well because the rewards are worth the effort.
About the Author
Fern Halper, Ph.D., is well known in the analytics community, having published hundreds of articles, research reports, speeches, webinars, and more on data mining and information technology over the past 20 years. Halper is also co-author of several “Dummies” books on cloud computing, hybrid cloud, and big data. She is VP and senior research director, advanced analytics at TDWI Research, focusing on predictive analytics, social media analysis, text analytics, cloud computing, and “big data” analytics approaches. She has been a partner at industry analyst firm Hurwitz & Associates and a lead analyst for Bell Labs. Her Ph.D. is from Texas A&M University. You can reach her at fhalper@tdwi.org, on Twitter @fhalper, and on LinkedIn at linkedin.com/in/fbhalper.