TDWI Blog

Philip RussomPhilip Russom, Ph.D., is senior director of TDWI Research for data management and is a well-known figure in data warehousing, integration, and quality, having published over 550 research reports, magazine articles, opinion columns, and speeches over a 20-year period. Before joining TDWI in 2005, Russom was an industry analyst covering data management at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and consultant, was a contributing editor with leading IT magazines, and a product manager at database vendors. His Ph.D. is from Yale. You can reach him by email (prussom@tdwi.org), on Twitter (twitter.com/prussom), and on LinkedIn (linkedin.com/in/philiprussom).


Big Data Analytics: An Overview in 20 Tweets

By Philip Russom, TDWI

To raise an awareness of the new tool features, user techniques, and team structures of Big Data Analytics, I recently issued a series of twenty tweets via Twitter, over a two-week period. The tweets also helped promote a TDWI Webinar on Big Data Analytics. Most of these tweets triggered responses to me or retweets. So I seem to have reached the business intelligence (BI) and data warehouse (DW) audience I was looking for – or at least touched a nerve!

To help you better understand Big Data Analytics and why you should care about it, I’d like to share some of the thoughts from these tweets with you. I think you’ll find them interesting because they provide an overview of Big Data Analytics in a form that’s compact, yet amazingly comprehensive.

Every tweet I wrote was a short sound bite or stat bite drawn from TDWI’s recent report on Big Data Analytics, which I researched and wrote. Many of the tweets focus on a statistic cited in the report, while other tweets are definitions stated in the report.

I left in the arcane acronyms, abbreviations, and incomplete sentences typical of tweets, because I think that all of you already know them or can figure them out. Even so, I deleted a few tiny URLs and repetitive phrases. I issued the tweets in groups, on related topics; so I’ve added some headings to this blog to show that organization. Otherwise, these are raw tweets.

Defining Big Data, Advanced Analytics, and Big Data Analytics
1. #BigData #Analytics = where advanced analytics operate on big data sets. So, it’s about 2 things. Learn more in Webinar http://bit.ly/qp4wp6
2. Advanced #Analytics = data mining, statistics, extreme SQL, data viz, artificial intell, language processing.
3. Advanced #Analytics = database techs like MapReduce, in-database & in-memory analytics, column stores.
4. Advanced #Analytics = discovering unknown biz facts. Instead of advanced, should call it discovery analytics
5. #BigData = not just multi-terabyte datasets. Also about diverse data types & real-time or streaming data.
6. Bleeding edge of #BigData = data streaming from sensors, robotics, monitor devices, Web logs.

Benefits and Barriers for Big Data Analytics
7. #TDWI SURVEY SEZ: #BigData #Analytics benefits customer relations, BI, most pre-existing analytic apps.
8. #TDWI SURVEY SEZ: Bad skills, sponsors, & database software are leading barriers to #BigData #Analytics.

Organizational Issues and Big Data Analytics
9. #TDWI SURVEY SEZ: 30% consider #BigData a data mgt problem. 70% think it a biz opp when analyzed. Attend #TDWI Webinar http://bit.ly/qp4wp6
10. #TDWI SURVEY SEZ: #BigData #Analytics is owned by BI/DW team (41%), dep’ts (21%), IT/CIO (12%).
11. #TDWI SURVEY SEZ: Business analyst is most common job title for designer of #BigData #Analytics.

The State of Big Data Analytics
12. #TDWI SURVEY SEZ: 74% of orgs have some form of analytics today. But only 34% do #BigData #Analytics.
13. #TDWI SURVEY SEZ: 37% of orgs have 10Tb+ of #BigData just for #Analytics. More on #TDWI Webinar http://bit.ly/qp4wp6
14. #TDWI SURVEY SEZ: 20% of orgs expect to have 500Tb+ of #BigData just for #Analytics by 2013.
15. #TDWI SURVEY SEZ: 64% of orgs today manage #BigData for #Analytics in EDW, 38% outside EDW.
16. #TDWI SURVEY SEZ: 24% claim to have Hadoop today. #TDWI suspects most are experimental downloads. But still impressive
17. #TDWI SURVEY SEZ: #BigData is struc 92%, semi-struc 54%, hier 54%, events 45%, unstruc 35%, social 34%, Web 31%...

Future Trends in Big Data Analytics
18. #TDWI SURVEY SEZ: 33% will replace #Analytics platform within 3 yrs. Another 11% after that. 9% already replaced.
19. #TDWI SURVEY SEZ: Why replace #Analytics platform? Poor scale, loading, query speed, real time, SOA, self service, viz.
20. #TDWI SURVEY SEZ: #BigData #Analytics techs set to grow most: advanced analytics, data viz, in-memory DBs, unstruc data

FOR FURTHER STUDY:
Don’t miss my next TDWI Webinar on Hadoop. I’ll lead a panel of vendor representatives in a discussion of Hadoop and its value for BI, DW, and analytics. Register online, so you can join us December 14, 2011 at noon ET.

For a more detailed discussion of Big Data Analytics – in a traditional publication! – see the TDWI Best Practices Report, titled Big Data Analytics, which is available in a PDF file via a free download.

You can also register for and replay my TDWI Webinar, where I present the findings of the Big Data Analytics report.

Philip Russom is the research director for data management at TDWI. You can reach him at prussom@tdwi.org or follow him as @prussom on Twitter.

Posted by Philip Russom, Ph.D. on December 7, 20110 comments


Master Data Management: Rules for the Next Generation

Blog by Philip Russom
Research Director for Data Management, TDWI

I’m currently researching a TDWI Best Practices Report that will redefine master data management (MDM) by describing what its next generation should look like. As part of the research, I’ve been interviewing users on the phone about their MDM programs.

The news so far is a mix of good and bad. I hate saying it, but half of the organizations I’ve talked with are mired in early lifecycle stages of their MDM programs, unable to get over certain humps and mature into the next generation. On the flip side, the other half is well into the next generation; so I know it can be done.

Allow me to list desirable capabilities of MDM’s next generation, and briefly say why these need to replace similar early phase capabilities. The following list (with a great deal more detail) will probably appear in my Next Generation MDM report that TDWI will publish April 2, 2012. After all, the list defines MDM’s next generation. And my goal is to establish a set of rules (or requirements) that can guide users into the next generation.

Multi-domain MDM. Many MDM solutions address only the customer data domain, and they need to move on to other domains, like products, financials, and locations. Single-data-domain MDM is a barrier to having common, consensus-based entity definitions and standard reference data that would allow you to correlate information across multiple domains. (See my blog The State of Multi-Data-Domain MDM.)

Multi-department, multi-application MDM. MDM for a single application (typically ERP, CRM or BI) is a safe and effective start. But the point of MDM is to share common definitions across multiple, diverse applications and the departments that depend on them. It’s important to overcome organizational boundaries if MDM is move from being a local fix to an enterprise infrastructure.

Bidirectional MDM. "Roach Motel MDM," as I call it, is when you extract reference data and study in a database from which it never emerges (as with many BI/DW systems). One-way MDM is bad whenever you need to improve reference data in a central place, then publish it out to a wide variety of operational applications. (See my article Roach Motel MDM.)

Real-time MDM. The strongest trend in data management today (and BI/DW, too) is toward real-time operation as a complement to batch. Real-time is critical to identity resolution and the immediate application of recent changes to reference data.

Consolidating multiple, competing MDM solutions. How can you have a single view of the customer, if you have multiple customer-domain MDM solutions? How can you correlate reference data across domains, if the domains are treated in separate MDM solutions? For many organizations, next-gen MDM begins with a consolidation of multiple MDM solutions.

Beyond enterprise data. Despite the obsession with customer data that most MDM solutions suffer, almost none of them today incorporate data about customers from Web sites or social media. If you’re truly serious about MDM as an enabler for CRM, next-gen MDM (and CRM, too) must reach into every customer channel.

Richer modeling. Reference data in the customer domain works fine with flat modeling, involving a simple (but very wide) record per customer. However, other domains make little sense without a richer, hierarchical model, as with a chart of accounts in finance of a bill of material in manufacturing. Metrics and KPIs – so common in BI, today – rarely have proper master data in multidimensional models. (See my article MDM for Performance Management.)

Coordination with other disciplines. To achieve next-gen goals, many organizations need to stop practicing MDM in a vacuum. Instead of MDM as merely a technical fix, it also needs to be aligned with business goals for data. And MDM should be coordinated with related data management disciplines, especially data integration and data quality. A solid data governance program can be an effective medium for such coordination. (See my blog MDM Can Learn from Data Quality.)

MDM Workflow. Development and collaborative efforts in MDM today are mostly ad hoc actions with little or no process. For MDM program to scale up and grow, it needs workflow functionality that automates the proposal, review, and approval process for newly created or improved reference and master data. Also, a few MDM programs need the kind of workflow enabled by tools for business process management. Vendor tools and dedicated applications for MDM are starting to support such workflows.

So, what do you think? Do you know of other generational changes that MDM is facing? Let me know.

================================

ANNOUNCEMENTS
Please take the TDWI MDM Survey for my upcoming report about Next-Generation MDM.

David Loshin and I will moderate the TDWI Solution Summit on Master Data, Quality, and Governance, coming up March 4-6, 2012 in Savannah, Georgia. You should attend!

Posted by Philip Russom, Ph.D. on November 17, 20110 comments


Big Data Analytics: The News from Informatica

Blog by Philip Russom
Research Director for Data Management, TDWI

Early this morning, Informatica Corporation announced Informatica HParser, a new product for parsing data in Apache Hadoop environments. Instead of repeating the details of the announcement – which you can read on www.informatica.com, etc. – I’d rather use the announcement as a springboard for my own thoughts about the bigger trends and issues in Big Data Analytics and Hadoop that the announcement fits into. The catch is that there are so many myths and misconceptions (i.e., “mythconceptions”) about Hadoop right now, that I can’t bust them all in a short piece like this blog. So I’ll just present the two leading mythconceptions as background, plus a brief rant for color.

First Mythconception. Hadoop is not one, monolithic thing, so we need to stop talking about it that way. It’s actually an open source software library administered by the Apache Software Foundation. (Some Hadoop products are also available via vendor distributions; but that’s another story.) The Apache Hadoop library includes several products and technologies, including (in BI priority order) the Hadoop Distributed File System (HDFS), MapReduce, Hive, Hbase, Pig, Zookeeper, Flume, Sqoop, Oozie, Hue, and so on. It’s up to you to figure out which combination of Apache Hadoop products to implement for a given application. For applications in business intelligence (BI) and Big Data Analytics, HDFS and MapReduce (perhaps with Hbase and Hive) constitute a useful technology stack.

Second Mythconception. Theoretically, HDFS can manage the storage and access of any data type, as long as you can put the data in a file and copy that file into HDFS. As outrageously simplistic as that sounds, it’s largely true, and it’s exactly what brings many users to Apache HDFS in the first place. Yet, HDFS’s admirable tolerance for diverse data doesn’t mean that an Apache Hadoop environment operates equally well with all file and data types. According to users I’ve interviewed, if you expect to get speed, scalability, and development simplicity, you need to work with Hadoop’s preference for record-based data. That’s not as limiting as it sounds, because many types of Big Data handled by HDFS are inherently record-based, as in logs from Web servers and sensors or table dumps of call detail records, customer records, transactions, etc. Furthermore, many sources of traditional enterprise data can be converted to records and copied to HDFS for Big Data Analytics and other applications.

And that brings us to Informatica Corporation’s announcement today of the new Informatica HParser. In a Hadoop environment, it’s MapReduce that actually executes the programmatic logic of an application. In the context of Big Data Analytics, the logic is (today) usually hand-coded data transformations or analytic logic. HParser provides an integrated development environment (IDE) for creating data transformation logic, plus ties into MapReduce to ensure that the logic executes in a fully distributed and parallel fashion. Given Apache Hadoop’s preference for record-based data, use cases cited by Informatica focus on how HParser can convert unstructured data into records and tables, plus flatten overly structured or “complex” data (as in the hierarchies of XML and JSON) into records that are more palatable to HDFS and Apache MapReduce. Record structures aside, Informatica HParser also supports a long list of data standards and document types. And Informatica PowerExchange for Hadoop provides additional functionality.

A brief rant. If you’ve been reading my writings on data integration for the last ten years, you know that I consider hand-coded data integration to be non-productive. Hand coding is time-consuming, not very re-usable, hard to update, and inherently feature-poor compared to vendor platforms. Now, we’re faced with Apache MapReduce, which – out of the box – demands huge amounts of hand coding, because it’s a processing engine that manages and provides parallelization for hand-coded routines (whether for analytics, DI, or otherwise). Informatica HParser shows promise for reducing the non-productive hand-coding that open-source environments like Hadoop, MapReduce, and Hive assume.

Conclusion. I feel that the men and women who’ve contributed to open source Hadoop have made an impressive and innovation contribution. And the Apache Software Foundation does a great job enabling the open source community. Thanks to these contributions, Hadoop is successfully used in production, but mostly in large, Internet-based businesses, like Amazon, Comscore, eBay, Google, and LinkedIn. However, for the Hadoop family – and the Big Data Analytics it enables – to become truly useful in a wide range of mainstream organizations across multiple industries, I think that the Hadoop family needs a number of new extensions, improvements, and options for interoperability.

This is why we’re now seeing software vendor companies coming out with various types of support for Apache Hadoop products and technologies. Informatica’s HParser and Informatica PowerExchange for Hadoop are prime examples, and other DI vendors will soon follow suit with similar interfaces and extensions for Hadoop. Some vendors are building administrative tools, which HDFS sorely lacks. And BI and analytic tool vendors are scrambling to sit atop HDFS and MapReduce. Personally, I hope to see more support for Hadoop and soon, because, without it, mainstream user organizations can’t get full value from Hadoop. Hence, they may not adopt it.

So, what do you think? Let me know!

===============================
Do you suffer mythconceptions about Hadoop? If so, TDWI can help you bust them:
• TDWI will soon publish my new Checklist Report on Hadoop, available as a free download on tdwi.org, starting Dec.13, 2011.
• On Dec.14, 2011, I’ll broadcast a TDWI Webinar based on that report. Please register online for the Hadoop Webinar.

Posted by Philip Russom, Ph.D. on November 2, 20110 comments


Big Data Analytics: The News from Teradata

Blog by Philip Russom
Research Director for Data Management, TDWI

Just moments ago, Teradata Corporation issued three announcements describing new capabilities, products, and releases. Instead of repeating the details of Teradata’s new stuff -- which you can read on www.teradata.com, etc. -- I’d rather be self-indulgent and use each announcement as a springboard for my own thoughts about the bigger trends in Big Data Analytics these relate to.

Announcement Number One: Teradata Columnar

A few years ago, I was at the Teradata Partners Conference. Instead of attending speaking sessions, I was in a series of meetings for industry analysts and industry influencers. When the topic of columnar databases came up -- and it was my turn to pontificate -- I said something like: “Columnar storage engines will soon be available as just another feature of database management systems from larger, more established vendors.” The room fell quiet, and a cricket chirped in the background. Then, two experts mocked me, while Teradata people were noticeably mum. ;)

Does that make me a prescient visionary? No, not at all. I’ve just been paying attention for the last three decades, as one technology after the next is developed and proved by a small startup, then bought or built by one or more of the leading DBMS vendors. We’ve seen this trend played out with features for everything from security to parallel processing to OLAP to federation to in-memory databases. We’re now seeing the same trend with columnar data stores and other technologies for Big Data Analytics.

Newish vendors like ParAccel and Vertica -- and Sybase long before them -- have proved the usefulness and commercial potential of a columnar approach. Open source DBMSs MySQL and Infobright made similar contributions. In full compliance with the trend I’m describing, IBM and Oracle have released columnar storage engines they built, and now it’s Teradata’s turn. Teradata Columnar is a new capability of Teradata Database 14. What’s new here is that Teradata has integrated both columnar AND row-based tables, thereby making hybrid applications more feasible. All the above is goodness, regardless of vendor, because columnar data stores have compelling advantages for query speed, data compression, bla, bla, bla, and the usual miraculous benefits.

This recurring trend begs the question: What’s the next new innovation that’s on the path to DBMS assimilation? It’s obvious to me that Hadoop and MapReduce are already well down that path. And that brings us to the next Teradata announcement.

Announcement Number Two: Teradata Aster MapReduce Platform

On the upside, MapReduce is the secret sauce that brings advanced analytic capability to a big data repository, whether it’s Hadoop’s file system or a relational database management system (RDBMS). On the downside, MapReduce from most sources is mired in hand-coding and devoid of SQL (to which we’re hand-cuffed in BI). Hence, MapReduce shows great promise for the world of BI, but only if it can evolve to suit the technical requirements of BI and DW professionals.

Evolving MapReduce is what the small vendor Aster Data Systems has always been about, and the evolution continues now that Teradata has acquired Aster. First, Aster showed that MapReduce could be effective with an RDBMS – at least, with its own nCluster database, now called Aster Database 5.0. Aster then showed that MapReduce and SQL can be reconciled, and they received a patent for their innovation in this realm.

Let’s shift gears and look at data warehouse appliances. Despite the term “data warehouse” in the name, these are really “big data analytics appliances.” I say this based on the fact that at least 90% of DW appliance owners use them for multi-terabyte analytics, not data warehousing. Aster is now showing that a MapReduce-based RDBMS can be suited to an appliance, as in the new Aster MapReduce Appliance based on Teradata hardware.

I’ll say more about the evolution of MapReduce in a TDWI Webinar on October 27. Please register online and attend.

Announcement Number Three: Teradata Database 14

Most of the new functionality of Teradata Database 14 seems focused on making the system even more manageable and performable, especially in the context of multiple, diverse, concurrent data warehouse workloads.

The multiple workload problem is a thorny one. From the DW professional’s viewpoint, it’s not easy to optimize a data warehouse for several workloads; so most of EDWs are optimized for a short list of workloads. Since the primary deliverables of the average DW are reports (whether standard or dashboards) and OLAP, most EDW designers consciously decide to optimize for these. But that makes it difficult to add new workloads to a centralized enterprise data warehouse, so new workloads are often distributed to marts, operational data stores, and data staging areas outside the warehouse proper. Examples of “new workloads” include those for real time, detailed source data, non-structured data, and discovery or exploratory analytics (not OLAP).

How DW professionals and vendors are responding to the challenge of multiple workloads constitutes a trend. That’s because the responses affect data warehouse architecture, logical modeling, optimization, performance, platform selection, tool selection, selection of analytic methods, management strategies for big data, and so on.

Note that the multiple workload challenge is both a user design issue and a vendor platform capability issue. Yet, I think the former can win out over the latter. A good design on a weak platform can succeed, though you’ll probably end up with a heavily distributed DW architecture. Conversely a bad design on a strong platform can fail, especially if you expect the platform to be the design. Technology and design issues aside, I must also point out that the placement of a DW workload can be influenced by organizational issues, like sponsorship, funding, and compliance.

So, what do you think? Let me know!

===============================
Want to learn more about Big Data Analytics? Attend the TDWI Forum on Big Data Analytics for Business Insight. There's more information online.

Posted by Philip Russom, Ph.D. on September 22, 20110 comments


Master Data Management Can Learn from Data Quality

Blog by Philip Russom
Research Director for Data Management, TDWI

For about a month now, I’ve been interviewing users on the phone, in search of speakers for upcoming TDWI events. I need speakers who can share their organization’s best practices and strategies for data management. As you can imagine, I’ve heard a lot great tips in these interviews, many of them concerning master data management (MDM).

A tip I’ve heard from people in multiple organizations is that MDM solutions achieve a higher level of success when they adopt some of the techniques and best practices of data quality (DQ). Let me give you some examples of DQ practices applied to MDM.

DQ techniques. For years, I’ve watched data integration solutions incorporate functions that originated with data quality tools, especially data profiling and data monitoring. In a similar trend, I’m now seeing MDM solutions incorporating DQ functions for data standardization, deduplication, augmentation, identification, and verification. After all, master and reference data benefits from these functions, just as any data domain would.

Data stewardship. DQ success usually depends on the processes of data stewardship. A data steward plays a key role in linking data quality work and standards to specific business goals and business applications. The average data steward can identify and prioritize DQ work that will yield a noticeable return for the business. I’m now seeing a similar stewardship approach to prioritizing MDM work.

Collaborative data management. Note that a steward’s priority list is only accurate, when developed in conjunction with business managers who know the impact of data’s quality on the business. Likewise, data stewardship can be a process for IT-to-business alignment and collaboration in the context of MDM, not just DQ.

Data governance (DG). I’ve seen a number of organizations take a successful data stewardship program (originally designed to support DQ) and evolve it into a data governance program. You see, a good data stewardship program will establish a process for proposing and authorizing changes to data and applications for the sake of improving data’s quality. A DG board or committee needs a similar process for the data standards and data usage policies it has to create and enforce. In fact, the first policies produced by a DG program usually govern data via quality rules. And a typical “next step” that a DG program takes is to apply said process to data standards and usage policies for MDM.

Change management. DQ and MDM share very similar goals, in that each strives to improve data, whether the data domain is master data, customer data, product data, financial data, etc. Achieving improvement almost always requires changes to data, applications, and how end-users use applications. Therefore, a change management process is key to effecting improvements. DQ has long standing change management processes via stewardship, plus new options for change management via data governance. MDM’s likelihood of effecting positive change is increased when it taps the data-oriented change management processes that evolved from DQ and stewardship.

Conclusion. Frankly, I’m not surprised that MDM solutions are absorbing DQ techniques and best practices. I’ve seen a similar absorption by DI solutions, going on for about ten years now. And I already mentioned how some data governance programs are essentially data stewardship programs, expanded into a data-standards-oriented form of data governance. So, it’s clear to me that a variety of data management disciplines can learn from DQ techniques and stewardship practices. And the discipline going through that cycle right now is MDM. You should follow this trend, if you’re not already.

So, what do you think, folks? Let me know. Thanks!

Posted by Philip Russom, Ph.D. on September 8, 20110 comments


The State of Multi-Data-Domain Master Data Management (MDM)

Blog by Philip Russom
Research Director for Data Management, TDWI

Allow me a moment to parachute into the middle of an issue that’s come up a lot this calendar year, namely multi-data-domain master data management (MDM). I assume you are familiar with MDM; if not, spend a few minutes on Wikipedia.

The issue is that most user organizations deploy single-domain MDM solutions. The most popular data domain is customer data, but other common domains for MDM are (in priority order) financials, products, partners, employees, and locations.

Here’s the problem with single-data-domain MDM. It’s a barrier to having common, consensus-based entity definitions and standard reference data that would allow you to correlate information across multiple domains. For example, single-domain MDM is great for creating a single view of customers. But it needs to federate or somehow integrate with MDM for the product-data domain, if you want to extend that view to include (with a high level of accuracy and consistency) products and services that each customer has acquired or considered. Or you might include financial or location data. Some day, you’ll include data from social media. All this is easier and more accurate with multi-data-domain MDM.

The examples probably sound analytic to you, but they’re equally applicable to operations. And multi-data-domain MDM can improve lots of data management functions, like analytics, identity resolution, customer intimacy, data quality, data integration, deduplication, and sharing data across disparate departments and their IT systems.

I wish it weren’t true, but I still see most MDM solutions as focused on the customer data domain -- and that’s all. If MDM addresses other domains -- typically financial or product data -- that’s done in a separate solution, with little or integration with MDM for customer data. Some user organizations have multiple customer-focused MDM solutions, say one each for marketing analytics, direct marketing, sales pipeline, customer service, and so on. So much for a single view of the customer! These organizations have their hands full consolidating customer-data-domain MDM solutions, and that delays the next step, which is multi-data-domain MDM.

Despite these dire situations, I’ve also encountered user organizations that have successfully extended MDM to span multiple data domains. And some of these spoke at TDWI’s Solution Summit on Master Data in March 2011. For example, Cathy Burrows from Royal Bank of Canada explained how they consolidated multiple MDM solutions to create a single, central, and governed MDM solution that provides a rich, accurate, and even intimate view of each customer. They’re now enriching customer views with reference data about the products these customers have.

As another example, Mark Love of the Veterans Health Administration (VHA) talked about how the VHA started with a form of MDM for patient identity, then branched out into many other domains. To keep the domaines straight and to leverage hierarchical relations among domains, the VHA created a “master set of domains.”

I got to thinking about all this because, just yesterday, I was talking about multi-data-domain MDM with Ravi Shankar of Informatica. “Most of our recent MDM deals are multi-domain,” he said. Ravi talked through a list of Informatica customers who have multi-data-domain MDM in production today. I can’t tell you the customer names, but they’re in banking, high-tech manufacturing, food services, and government agencies. All began with one domain, then extended to others. Also, all deployed MDM in combination with their data integration and/or data quality solutions, which shows how MDM is interrelated with other data management disciplines. The list Ravi shared with me gives me confidence that more and more user organizations are succeeding with multi-data-domain MDM – and that’s a good thing.

But the future of multi-data-domain MDM isn’t totally rosy. At TDWI’s Solution Summit on Master Data in March 2011, we also heard from Evan Levy of Baseline Consulting (recently acquired by DataFlux). He said: “Multi-data-domain MDM is technically feasible today. But it makes no sense in terms of sponsorship, funding, or satisfying departmental and application-specific requirements.”

I agree with Evan’s second point wholeheartedly, because a number of users have explained to me over the years that sales and marketing need to own customer-data-domain MDM, even if it’s only applied within their customer-base segmentation, direct marketing, and sales contact applications. Likewise, the supply chain managers want to fund and control product and partner reference data. The financial guys have their own requirements for financial data, and HR has MDM requirements for employee data. All too often, these departments aren’t too keen on sharing.

But I don’t fully agree with Evan’s first point. I think there ARE situations where multi-data-domain MDM makes perfect sense, and I noted those earlier in this blog. In my experience, a common tipping point is often when technical and business people have reached maturity with customer-data MDM, and they realize they can’t get to the next level without consistent and integrated MDM about other domains.

Another way to put it is that the single view of the customer gets broader as it matures, thus demanding information from other domains. Yet another way to think of it is that multi-data-domain MDM often comes in a later life cycle stage, after single-data-domain MDM has proved the concept of MDM, in general. And much of the success of multi-data-domain MDM -- in my opinion -- is not about technology. Success depends on having a corporate culture that demands data sharing in support of cross-functional coordination.

So, folks, what do you think about the state of multi-data-domain MDM? Let me know. Thanks!

(Note that TDWI will repeat (for the fourth year) its Solution Summit on Master Data, Quality, and Governance, coming up March 4-6, 2012 in Savannah, Georgia. Mark your calendar!)

Posted by Philip Russom, Ph.D. on August 24, 20110 comments