Self-Service Master Data Management

Once data is under management in its best-fit leveragable platform in an organization, it is as prepared as it can be to serve its many callings. It is in position to be used for purposes operationally and analytically and across the spectrum of need. Ideas emerge from business areas no longer encumbered with the burden of managing data, which can be 60% – 70% of the effort to bring the idea to reality. Walls of distrust in data come down and the organization can truly excel with an important barrier to success removed.
An important goal of the information management function in an organization is to get all data under management by this definition, and to keep it under management as systems come and go over time.
Master Data Management (MDM) is one of these key leveragable platforms. It is the elegant place for data with widespread use in the organization. It becomes the system of record for customer, product, store, material, reference and all other non-transactional data. MDM data can be accessed directly from the hub or, more commonly, mapped and distributed widely throughout the organization. This use of MDM data does not even account for the significant MDM benefit of efficiently creating and curating master data to begin with.
MDM benefits are many, including hierarchy management, data quality, data governance/workflow, data curation, and data distribution. One overlooked benefit is just having a database where trusted data can be accessed. Like any data for access, the visualization aspect of this is important. With MDM data having a strong associative quality to it, the graph representation works quite well.
Graph traversals are a natural way for analyzing network patterns. Graphs can handle high degrees of separation with ease and facilitate visualization and exploration of networks and hierarchies. Graph databases themselves are no substitute for MDM as they provide only one of the many necessary functions that an MDM tool does. However, when graph technology is embedded within MDM, such as what IBM is doing in InfoSphere MDM – it is very powerful.
Graph technology is one of the many ways to facilitate self-service to MDM. Long a goal of business intelligence, self-service has significant applicability to MDM as well. Self-service is opportunity oriented. Users may want to validate a hypothesis, experiment, innovate, etc. Long development cycles or laborious process between a user and the data can be frustrating.
Historically, the burden for all MDM functions has fallen squarely on a centralized, development function. It’s overloaded and, as with the self-service business intelligence movement, needs disintermediation. IBM is fundamentally changing this dynamic with the next release of Infosphere MDM. Its self-service data import, matching, and lightweight analytics allows the business user to find, share and get insight from both MDM and other data.
Then there’s Big Match. Big Match can analyze structured and unstructured customer data together to gain deeper customer insights. It can enable fast, efficient linking of data from multiple sources to grow and curate customer information. The majority of the information in your organization that is not under management is unstructured data. Unstructured data has always been a valuable asset to organizations, but it can be difficult to manage. Emails, documents, medical records, contracts, design specifications, legal agreements, advertisements, delivery instructions, and other text-based sources of information do not fit neatly into tabular relational databases. Most BI tools on MDM data offer the ability to drill down and roll up data in reports and dashboards, which is good. But what about the ability to “walk sideways” across data sources to discover how different parts of the business interrelate?
Using unstructured data for customer profiling allows organizations to unify diverse data from inside and outside the enterprise—even the “ugly” stuff; that is, dirty data that is incompatible with highly structured, fact-dimension data that would have been too costly to combine using traditional integration and ETL methods.
Finally, unstructured data management enables text analytics, so that organizations can gain insight into customer sentiment, competitive trends, current news trends, and other critical business information. In text analytics, everything is fair game for consideration, including customer complaints, product reviews from the web, call center transcripts, medical records, and comment/note fields in an operational system. Combining unstructured data with artificial intelligence and natural language processing can extract new attributes and facts for entities such as people, location, and sentiment from text, which can then be used to enrich the analytic experience.
All of these uses and capabilities are enhanced if they can be provided using a self-service interface that users can easily leverage to enrich data from within their apps and sources. This opens up a whole new world for discovery.
With graph technology, distribution of the publishing function and the integration of all data including unstructured data, MDM can truly have important data under management, empower the business user, be the cornerstone to digital transformation and truly be self-service.

Master Data Management Joins the Machine Learning Party

In a normal master data management (MDM) project, a current state business process flow is built, followed by a future state business process flow that incorporates master data management. The current state is usually ugly as it has been built piecemeal over time and represents something so onerous that the company is finally willing to do something about it and inject master data management into the process. Many obvious improvements to process come out of this exercise and the future state is usually quite streamlined, which is one of the benefits of MDM.
I present today that these future state processes are seldom as optimized as they could be.
Consider the following snippet, supposedly part of an optimized future state.

This leaves in the process four people to manually look at the product, do their (unspecified) thing and (hopefully) pass it along, but possibly send it backwards to an upstream participant based on nothing evident in particular.
The challenge for MDM is to optimize the flow. I suggest that many of the “approval jails” in business process workflow are ripe for reengineering. What criteria is used? It’s probably based on data that will now be in MDM. If training data for machine learning (ML) is available, not only can we recreate past decisions to automate future decisions, we can look at the results of those decisions and take past outcomes and actually create decisions in the process that should have been made and actually do them, speeding up the flow and improving the quality by an order of magnitude.
This concept of thinking ahead and automating decisions extends to other kinds of steps in a business flow that involve data entry, including survivorship determination. As with acceptance & rejection, data entry is also highly predictable, whether it is a selection from a drop-down or free-form entry. Again, with training data and backtesting, probable contributions at that step can be manifested and either automatically entered or provided as default for approval. The latter approach can be used while growing a comfort level.
Manual, human-scale processes, are ripe for the picking and it’s really a dereliction of duty to “do” MDM without significantly streamlining processes, much of which is done by eliminating the manual. As data volumes mount, it is often the only way to not watch process time increase over time. At the least, prioritizing stewardship activities or routing activities to specific stewards based on an ML interpretation of past results (quality, quantity) is required. This approach is paramount to having timely, data-infused processes.
As a modular and scalable trusted analytics foundational element, the IBM Unified Governance & Integration platform incorporates advanced machine learning capabilities into MDM processes, simplifying the user experience and adding cognitive capabilities.
Machine learning can also discover master data by looking at actual usage patterns. ML can source, suggest or utilize external data that would aid in the goal of business processes. Another important part of MDM is data quality (DQ). ML’s ability to recommend and/or apply DQ to data, in or out of MDM, is coming on strong. Name-identity reconciliation is a specific example but generally ML can look downstream of processes to see the chaos created by data lacking full DQ and start applying the rules to the data upstream.
IBM InfoSphere Master Data Management utilizes machine learning to speed the data discovery, mapping, quality and import processes.
In the last post, I postulated that blockchain would impact MDM tremendously. In this post, it’s machine learning affecting MDM. (Don’t get me started on graph technology). Welcome to the new center of the data universe. MDM is about to undergo a revolution. Products will look much different in 5 years. Make sure your vendor is committed to the MDM journey with machine learning.

When Worlds Collide: Blockchain and Master Data Management

Master Data Management (MDM) is an approach to the management of golden records that has been around over a decade only to find a growth spurt lately as some organizations are exceeding pain thresholds in the management of common data. Blockchain has a slightly shorter history, coming aboard with bitcoin, but also is seeing its revolution these days as data gets distributed far and wide and trust has taken center stage in business relationships.  
Volumes could be written about each on its own, and given that most organizations still have a way to go with each discipline, that might be appropriate. However, good ideas wait for no one and today’s idea is MDM on Blockchain.
Thinking back over our MDM implementations over the years, it is easy to see the data distribution network becoming wider. As a matter of fact, master data distribution is usually the most time-intensive and unwieldy part of an MDM implementation anymore. The blockchain removes overhead, costs and unreliability from authenticated peer-to-peer network partner transactions involving data exchange. It can support one of the big challenges of MDM with governed, bi-directional synchronization of master data between the blockchain and enterprise MDM.
Another core MDM challenge is arriving at the “single version of the truth”. It’s elusive even with MDM because everyone must tacitly agree to the process used to instantiate the data in the first place. While many MDM practitioners go to great lengths to utilize the data rules from a data governance process, it is still a process subject to criticism. The consensus that blockchain can achieve is a governance proxy for that elusive “single version of the truth” by achieving group consensus for trust as well as full lineage of data.
Blockchain enables the major components and tackles the major challenges in MDM.
Blockchain provides a distributed database, as opposed to a centralized hub, that can store data that is certified, and for perpetuity. By storing timestamped and linked blocks, the blockchain is unalterable and permanent. Though not for low latency transactions yet, transactions involving master data, such as financial settlements, are ideal for blockchain and can be sped up by an order of magnitude since blockchain removes the grist in a normal process.
Blockchain uses pre-defined rules that act as gatekeepers of data quality and governs the way in which data is utilized. Blockchains can be deployed publicly (like bitcoin) or internally (like an implementation of Hyperledger). There could be a blockchain per subject area (like customer or product) in the implementation. MDM will begin by utilizing these internal blockchain networks, also known as Distributed Ledger Technology, though utilization of public blockchains are inevitable.
A shared master data ledger beyond company boundaries can, for example, contain common and agreed master data including public company information and common contract clauses with only counterparties able to see the content and destination of the communication.
Hyperledger is quickly becoming the standard for open source blockchain. Hyperledger is hosted by The Linux Foundation. IBM, with the Hyperledger Fabric, is establishing the framework for blockchain in the enterprise. Supporting master data management with a programmable interface for confidential transactions over a permissioned network is becoming a key inflection point for blockchain and Hyperledger.
Data management is about right data at the right time and master data is fundamental to great data management, which is why centralized approaches like the discipline of master data management has taken center stage. MDM can utilize the blockchain for distribution and governance and blockchain can clearly utilize the great master data produced by MDM. Blockchain data needs data governance like any data. This data actually needs it more given its importance on the network.
MDM and blockchain are going to be intertwined now. It enables the key components of establishing and distributing the single version of the truth of data. Blockchain enables trusted, governed data. It integrates this data across broad networks. It prevents duplication and provides data lineage.
It will start in MDM in niches that demand these traits such as financial, insurance and government data. You can get to know the customer better with native fuzzy search and matching in the blockchain. You can track provenance, ownership, relationship and lineage of assets, do trade/channel finance and post-trade reconciliation/settlement.
Blockchain is now a disruption vector for MDM. MDM vendors need to be at least blockchain-aware today, creating the ability for blockchain integration in the near future, such as what IBM InfoSphere Master Data Management is doing this year. Others will lose ground.