Analytics and Reporting Are Two Different Practices
Treat them differently, if you want to get the most out of each.
By Philip Russom, TDWI Research Director for Data Management
I regularly get somewhat off-base questions from users who are in the thick of implementing or growing their analytic programs, and therefore get a bit carried away. Here’s a question I’ve heard a lot recently: “Our analytic applications generate so many insights that I should decommission my enterprise reporting platform, right?” And here’s a related question: “Should we implement Hadoop to replace our data warehouse and/or reporting platform?”
The common misconception I perceive behind these questions (which makes them “off-base” in my perception) is that people seem to be forgetting that analytics and reporting are two different practices. Analytics and reporting serve different user constituencies, produce different deliverables, prepare data differently, and support organizational goals differently. Despite a fair amount of overlap, I see analytics and reporting as complementary, which means you most likely need both and neither will replace the other. Furthermore, due to their differences, each has unique tool and data platform requirements that you need to satisfy, if you’re to get the most out of each.
Allow me to net it out with a few sweeping generalizations.
Reporting is mostly about entities and facts you know well, represented by highly polished data that you know well. And that data usually takes the form of carefully modeled and cleansed data with rich metadata and master data that’s managed in a data warehouse. In fact, it’s difficult to separate reporting and data warehouses, because most users designed their DWs first and foremost as a repository for reporting and similar practices such as OLAP, performance management, dashboards, and operational BI.
I regularly hear claims that Hadoop can replace a true DW. But I doubt this, because the current state of Hadoop cannot satisfy the data requirements of enterprise reporting near as well as the average DW can. Ultimately, it’s not about the warehouse per se; it’s about practices a DW supports well, such as reporting. I reserve the right to change my mind in the future, because Hadoop gets more sophisticated almost daily. My real point: most enterprise reporting depends on a DW for success, so keep and protect the DW.
Advanced analytics enables the discovery of new facts you didn’t know, based on the exploration and analysis of data that’s probably new to you. New data sources generally tell you new things, which is one reason organizations are analyzing big data more than ever before. Unlike the pristine data that reports operate on, advanced analytics works best with detailed source data in its original (even messy) form, using discovery oriented technologies, such as mining, statistics, predictive algorithms, and natural language processing. Sure, DWs can be expanded to support some forms of big data and advanced analytics. But the extreme volumes and diversity of big data are driving more and more users to locate big data on a platform besides a DW, such as Hadoop, DW appliances, or columnar databases.
I personally think that providing separate data platforms for reporting and analytics is a win-win data strategy. It frees up capacity on the DW, so it can continue growing and supporting enterprise reporting plus related practices. And it gives advanced analytics a data platform that’s more conducive to exploration and discovery than the average DW is.
Reporting is like a “high-volume business,” whereas analytics is like a “high-value business.” For example, with so-called enterprise business intelligence, thousands of concurrent report consumers access tens of thousands of reports that are refreshed nightly. By comparison, a small team of data analysts can transform an organization with a few high-value insights, such as new customer segments, visibility into costs, correlations between supplies and product quality, fraud detection, risk calculations, and so on. For completely different reasons, you need both reporting and analytics to serve the full range of user constituencies and provide many different levels of information and insight.
Most reports demand numeric precision, whereas most analyses don’t. Think financial reports (accurate to the penny) versus website page view reports (where guesstimates are fine).
Most enterprise reports require an audit trail, whereas few analyses do. Think regulatory reports versus the scores of an analytic model for customer churn.
Data management techniques differ. Squeaky clean report data demands elaborate data processing (for ETL, quality, metadata, master data, and so on), whereas preparing raw source data for analytics is simpler, though at higher levels of scale.
CONCLUSIONS: Despite some overlap, enterprise reporting and advanced analytics are so different as to be complementary. Hence, neither will replace the other. Both do important things for an information-driven organization, so you must give each what it needs for success, both at the tool level and at the data management level. Taking seriously the data requirements of big data analytics may lead you to implement Hadoop; but that doesn’t mean that Hadoop will replace a DW, which is still required to satisfy the data requirements of reporting and related practices, such as OLAP, performance management, dashboards, and operational BI.
Posted by Philip Russom, Ph.D. on September 26, 2013