TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Checklist Report | Five Considerations for a Data Platform to Support AI Application Development December 16, 2024
  - TDWI Digital Dialogue | Scaling Data Integration and Analytics Pipelines November 18, 2024
  - TDWI Best Practices Report | Distributed Data Management: Solving Challenges and Maximizing Opportunities November 6, 2024
  - 2024 The State of Data Governance Report October 9, 2024
- Webinars
  - Building Breakthroughs: Harnessing Data and AI for Innovation December 18, 2024
  - Expert Panel Exploring Best Practices for Unified Data Management January 13, 2025
  - De-Risking Innovation: Safely Adopting GenAI January 14, 2025
  - Talking Business to Your Data: Conversational Analytics January 16, 2025
- Virtual Summits
  - Virtual Events TDWI Virtual Summit Series: Generative AI in Action: Lessons Learned from Successful Implementations December 9, 2024
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
Train
- In-Person Events
  - Conference TDWI Transform West - Las Vegas December 13, 2024
  - Executive Summit TDWI Modern Data Leader's Summit Las Vegas: Transforming Data Ecosystems December 17, 2024
- Virtual Live Seminars
  - Seminar Data Architecture Essentials: Building a Data Foundation for Enterprise Analytics November 26, 2024
  - Seminar Getting Started with AI in Your Organization November 26, 2024
  - Seminar Data Modeling Essentials November 26, 2024
  - Seminar ChatGPT 101 for Business Users November 26, 2024
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Blog

TDWI Blog: Data 360

Big Data: Something Borrowed, Something Blue

When you’re 100 years old, as IBM is this year, it would be easy to think that you’ve seen it all. What could possibly be new to Big Blue about “big data”? In the view of Robert LeBlanc, SVP of Middleware Software for the IBM Software Group, quite a bit.

The new problem set, defined by business opportunities opening up due to the availability of new sources of information, cannot be solved with traditional data systems alone. Kicking off the IBM Big Data Symposium for industry analysts at the Yorktown Research Center on May 11, LeBlanc itemized a number of challenges, including multi-channel customer sentiment and experience analysis, detection of life-threatening conditions at hospitals in time to intervene, Medicare fraud interdiction before payment, and weather pattern predictions to optimize wind turbine locations. (Note: The next TDWI Solution Summit, September 25-27 in San Diego, will feature case studies focused on the theme of “Deep Analytics for Big Data.”)

“Big data” is both an evolutionary and revolutionary phenomenon. Given that organizations have been working with large data warehouses and other types of files for some time, it should come as no surprise that the sheer quantity of data would continue to grow. Data is a renewable resource; the more applications and systems that use it, the more data that they tend to generate. Data warehouses will continue to be important, but even as the terabytes of structured data pile up, organizations are hunting down unstructured sources to tap their value and discover new competitive advantages.

IBM’s view of what makes big data revolutionary comes down to the convergence of the three “V’s”: volume, velocity, and variety. Volume is the easiest to understand, although IBM speakers at the Symposium described scenarios where so much data was streaming through in real time that storing it was impossible. Huge data volumes plus the velocity with which it is flowing in are opening up opportunities for technology alternatives, including Hadoop, MapReduce, and event stream processing. Variety, the third “V,” adds in the unstructured and complex data sources growing up on the Web, particularly in social media. Some organizations, of course, do store all this data; Eric Baldeschwieler, VP of Hadoop Development at Yahoo!, described their use of the Hadoop Distributed File System (HDFS) to store petabytes of data on nodes through its vast array of clusters. “Hadoop is behind everything we do,” he said.

It was not surprising news, but Baldeschwieler and IBM experts gave a full-throated defense of Apache Hadoop and the importance of having open source software at the foundation of big data programs. IBM did not mention EMC explicitly, but it was clear that the company was responding to EMC’s May 9 announcement of the new Greenplum HD Data Computing Appliance, which offers its own distribution of Apache Hadoop. IBM execs warned of the dangers of “forking,” which is what happened when vendors created their own versions of the UNIX operating system and users had to deal with competing standards. Baldeschwieler and IBM execs did acknowledge, however, that Apache Hadoop is far from a finished product, and in any case is not the solution to all problems.

I came away from the Symposium excited by the future of big data analytics but also aware that there’s a long way to go. “Big data” is not about a single technology, such as Hadoop or MapReduce (for more on Hadoop, see my colleague, Philip Russom’s interview with the CEO of Cloudera here). These technologies are more of a complement to data warehousing rather than replacement for it. Yahoo!’s Baldeschwieler made the point that Yahoo also has data warehouses. As each industry’s requirements become clearer, vendors such as IBM will assemble packages that will bring together the strengths in their existing solutions with new technologies. Then, organizations will have a better understanding of how to compare the vendors’ offerings. We’re not quite there yet.

Posted by David Stodder on May 17, 2011

Contributors

View All Blog Posts

Comments

Average Rating

Add your Comment

Your Name:(optional)

Your Email:(optional)

Your Location:(optional)

Rating:

Please rate

Comment:

Please type the letters/numbers you see above.

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Blog

TDWI Blog: Data 360

Big Data: Something Borrowed, Something Blue

Contributors

Comments

Add your Comment

TDWI

Engage

Research