Blog by Philip Russom
Research Director for Data Management, TDWI
A few weeks ago, I talked with Mike Eacrett, the vice president of product management for SAP HANA at SAP Labs. Among other things, Mike explained the “secret sauce” that gives SAP HANA flexibility and performance for big data analytics. Give me a moment to recount Mike’s explanation.
Philip Russom
: What forms of analytics are you seeing on the rise with SAP customers?
More
Posted by Philip Russom, Ph.D. on June 27, 20110 comments
Blog by Philip Russom
Research Director for Data Management, TDWI
What exactly is Big Data Analytics?
It’s two things: big data and the kind of analytics users want to do with big data. Let’s start with big data, then come back to analytics.
Users interviewed by TDWI state that data isn’t big until it breaks 10Tb. So that’s the low end of big data. And some user organizations have cached away hundreds of terabytes--just for analytics. The size of big data is relative; hundreds of TBs isn’t new, but hundred just for analytics is—at least, for most user organizations.
More
Posted by Philip Russom, Ph.D. on June 21, 20110 comments
Blog by Philip Russom
Research Director for Data Management, TDWI
In prior blogs, I’ve talked about how big data’s primary attribute is data volume. That’s pretty obvious. But it’s defined by other characteristics, too. For example, one of the things that makes big data so big is that it’s coming from a greater variety of sources than ever before. Now let’s look at the last of the three Vs of Big Data Analytics, namely data velocity.
Data Feed Velocity as a defining attribute of Big Data
More
Posted by Philip Russom, Ph.D. on June 17, 20110 comments
Blog by Philip Russom
Research Director for Data Management, TDWI
This blog is number 2 in a series of 3, about the three Vs of big data analytics, namely data volume, variety, and velocity. You can read the first blog here online.
Data Type Variety as a defining attribute of Big Data
One of the things that makes big data big is that it’s coming from a greater variety of sources than ever before. Many of the newer ones are Web sources (logs, click streams, and social media). Sure, user organizations have been collecting Web data for years. But, for most organizations, it’s been a kind of hoarding. We’ve seen similar un-tapped big data collected and hoarded, such as RFID data from supply chain apps, text data from call center apps, semi-structured data from various insurance processes, and geospatial data in logistics. What’s changed is that far more users are now analyzing big data, instead of merely hoarding it. And the few organizations that have been analyzing it, now do so at a more complex and sophisticated level. A related point is that big data isn’t new; but the effective leverage of it for analytics is. (For more on that point, see my blog: The Intersection of Big Data and Advanced Analytics.)
But my real point for this blog is that the recent tapping of these sources means that so-called structured data (which previously held unchallenged hegemony in analytics) is now joined (both figuratively and literally) by unstructured data (text and human language) and semi-structured data (XML, RSS feeds). There’s also data that’s hard to categorize, as it comes from audio, video, and other devices. Plus, multidimensional data can be drawn from a data warehouse to add historic context to big data. I hope you realize that’s a far more eclectic mix of data types than analytics has ever seen (or any discipline within BI, for that matter). So, with big data, variety is just as big as volume. Plus, variety and volume tend to fuel each other.
To further support the point that big data is about variety, let’s look at Hadoop. I managed to find a couple of users who’ve used Hadoop as an analytic database. Both said the same thing: Hadoop’s scalability for big data volumes is impressive. But the real reason they’re working with Hadoop is its ability to manage a very broad range of data types in its file system, plus process analytic queries via MapReduce across numerous eccentric data types.
Stay tuned for the third and final blog in this series, which will be titled: The Three Vs of Big Data Analytics: VELOCITY.
=============================================
NOTE -- Don’t miss TDWI’s Big Data Analytics Survey. Please share your opinions and experiences by taking this online survey.
Posted by Philip Russom, Ph.D. on June 14, 20110 comments
Blog by Philip Russom
Research Director for Data ManagementTDWI
I was recently on a group call along with several other analysts where IBMers spelled out their definition of big data. They structured the definition by explaining big data’s primary attributes, namely data volume, data type variety, and the velocity of streams and other real time data. I don’t necessarily agree with everything the IBMers said, but I must say that the three Vs of big data – volume, variety, and velocity – constitute a more comprehensive definition than I’ve heard elsewhere. In particular, the three Vs bust the myth that big data is only about data volume. Plus, the term “three Vs” is a catchy mnemonic. So I freely admit that I am shamelessly stealing the concept of the three Vs as a structure for my own definition of big data.
Note that IBMers didn’t consistently link big data with advanced analytics – but I will. This blog focuses on data volume, whereas other upcoming blogs will hit data type variety and data stream velocity.
More
Posted by Philip Russom, Ph.D. on June 9, 20110 comments
Blog by Philip Russom
Research Director for Data Management, TDWI
I recently chatted with Paul Groom, the VP of Business Intelligence at Kognitio. Among other things, Paul had some great tips for moving beyond common barriers to analytics with big data. I’d like to share some of those tips with you.
Philip Russom
: I’ve encountered several user companies that are hoarding big data – especially log data from Web sites – but they don’t know how to get started with analyzing it. Are you seeing this, too?
More
Posted by Philip Russom, Ph.D. on May 31, 20110 comments