Self-Service Master Data Management

Once data is under management in its best-fit leveragable platform in an organization, it is as prepared as it can be to serve its many callings. It is in position to be used for purposes operationally and analytically and across the spectrum of need. Ideas emerge from business areas no longer encumbered with the burden of managing data, which can be 60% – 70% of the effort to bring the idea to reality. Walls of distrust in data come down and the organization can truly excel with an important barrier to success removed.
An important goal of the information management function in an organization is to get all data under management by this definition, and to keep it under management as systems come and go over time.
Master Data Management (MDM) is one of these key leveragable platforms. It is the elegant place for data with widespread use in the organization. It becomes the system of record for customer, product, store, material, reference and all other non-transactional data. MDM data can be accessed directly from the hub or, more commonly, mapped and distributed widely throughout the organization. This use of MDM data does not even account for the significant MDM benefit of efficiently creating and curating master data to begin with.
MDM benefits are many, including hierarchy management, data quality, data governance/workflow, data curation, and data distribution. One overlooked benefit is just having a database where trusted data can be accessed. Like any data for access, the visualization aspect of this is important. With MDM data having a strong associative quality to it, the graph representation works quite well.
Graph traversals are a natural way for analyzing network patterns. Graphs can handle high degrees of separation with ease and facilitate visualization and exploration of networks and hierarchies. Graph databases themselves are no substitute for MDM as they provide only one of the many necessary functions that an MDM tool does. However, when graph technology is embedded within MDM, such as what IBM is doing in InfoSphere MDM – it is very powerful.
Graph technology is one of the many ways to facilitate self-service to MDM. Long a goal of business intelligence, self-service has significant applicability to MDM as well. Self-service is opportunity oriented. Users may want to validate a hypothesis, experiment, innovate, etc. Long development cycles or laborious process between a user and the data can be frustrating.
Historically, the burden for all MDM functions has fallen squarely on a centralized, development function. It’s overloaded and, as with the self-service business intelligence movement, needs disintermediation. IBM is fundamentally changing this dynamic with the next release of Infosphere MDM. Its self-service data import, matching, and lightweight analytics allows the business user to find, share and get insight from both MDM and other data.
Then there’s Big Match. Big Match can analyze structured and unstructured customer data together to gain deeper customer insights. It can enable fast, efficient linking of data from multiple sources to grow and curate customer information. The majority of the information in your organization that is not under management is unstructured data. Unstructured data has always been a valuable asset to organizations, but it can be difficult to manage. Emails, documents, medical records, contracts, design specifications, legal agreements, advertisements, delivery instructions, and other text-based sources of information do not fit neatly into tabular relational databases. Most BI tools on MDM data offer the ability to drill down and roll up data in reports and dashboards, which is good. But what about the ability to “walk sideways” across data sources to discover how different parts of the business interrelate?
Using unstructured data for customer profiling allows organizations to unify diverse data from inside and outside the enterprise—even the “ugly” stuff; that is, dirty data that is incompatible with highly structured, fact-dimension data that would have been too costly to combine using traditional integration and ETL methods.
Finally, unstructured data management enables text analytics, so that organizations can gain insight into customer sentiment, competitive trends, current news trends, and other critical business information. In text analytics, everything is fair game for consideration, including customer complaints, product reviews from the web, call center transcripts, medical records, and comment/note fields in an operational system. Combining unstructured data with artificial intelligence and natural language processing can extract new attributes and facts for entities such as people, location, and sentiment from text, which can then be used to enrich the analytic experience.
All of these uses and capabilities are enhanced if they can be provided using a self-service interface that users can easily leverage to enrich data from within their apps and sources. This opens up a whole new world for discovery.
With graph technology, distribution of the publishing function and the integration of all data including unstructured data, MDM can truly have important data under management, empower the business user, be the cornerstone to digital transformation and truly be self-service.

Master Data Management Joins the Machine Learning Party

In a normal master data management (MDM) project, a current state business process flow is built, followed by a future state business process flow that incorporates master data management. The current state is usually ugly as it has been built piecemeal over time and represents something so onerous that the company is finally willing to do something about it and inject master data management into the process. Many obvious improvements to process come out of this exercise and the future state is usually quite streamlined, which is one of the benefits of MDM.
I present today that these future state processes are seldom as optimized as they could be.
Consider the following snippet, supposedly part of an optimized future state.

This leaves in the process four people to manually look at the product, do their (unspecified) thing and (hopefully) pass it along, but possibly send it backwards to an upstream participant based on nothing evident in particular.
The challenge for MDM is to optimize the flow. I suggest that many of the “approval jails” in business process workflow are ripe for reengineering. What criteria is used? It’s probably based on data that will now be in MDM. If training data for machine learning (ML) is available, not only can we recreate past decisions to automate future decisions, we can look at the results of those decisions and take past outcomes and actually create decisions in the process that should have been made and actually do them, speeding up the flow and improving the quality by an order of magnitude.
This concept of thinking ahead and automating decisions extends to other kinds of steps in a business flow that involve data entry, including survivorship determination. As with acceptance & rejection, data entry is also highly predictable, whether it is a selection from a drop-down or free-form entry. Again, with training data and backtesting, probable contributions at that step can be manifested and either automatically entered or provided as default for approval. The latter approach can be used while growing a comfort level.
Manual, human-scale processes, are ripe for the picking and it’s really a dereliction of duty to “do” MDM without significantly streamlining processes, much of which is done by eliminating the manual. As data volumes mount, it is often the only way to not watch process time increase over time. At the least, prioritizing stewardship activities or routing activities to specific stewards based on an ML interpretation of past results (quality, quantity) is required. This approach is paramount to having timely, data-infused processes.
As a modular and scalable trusted analytics foundational element, the IBM Unified Governance & Integration platform incorporates advanced machine learning capabilities into MDM processes, simplifying the user experience and adding cognitive capabilities.
Machine learning can also discover master data by looking at actual usage patterns. ML can source, suggest or utilize external data that would aid in the goal of business processes. Another important part of MDM is data quality (DQ). ML’s ability to recommend and/or apply DQ to data, in or out of MDM, is coming on strong. Name-identity reconciliation is a specific example but generally ML can look downstream of processes to see the chaos created by data lacking full DQ and start applying the rules to the data upstream.
IBM InfoSphere Master Data Management utilizes machine learning to speed the data discovery, mapping, quality and import processes.
In the last post, I postulated that blockchain would impact MDM tremendously. In this post, it’s machine learning affecting MDM. (Don’t get me started on graph technology). Welcome to the new center of the data universe. MDM is about to undergo a revolution. Products will look much different in 5 years. Make sure your vendor is committed to the MDM journey with machine learning.

When Worlds Collide: Blockchain and Master Data Management

Master Data Management (MDM) is an approach to the management of golden records that has been around over a decade only to find a growth spurt lately as some organizations are exceeding pain thresholds in the management of common data. Blockchain has a slightly shorter history, coming aboard with bitcoin, but also is seeing its revolution these days as data gets distributed far and wide and trust has taken center stage in business relationships.  
Volumes could be written about each on its own, and given that most organizations still have a way to go with each discipline, that might be appropriate. However, good ideas wait for no one and today’s idea is MDM on Blockchain.
Thinking back over our MDM implementations over the years, it is easy to see the data distribution network becoming wider. As a matter of fact, master data distribution is usually the most time-intensive and unwieldy part of an MDM implementation anymore. The blockchain removes overhead, costs and unreliability from authenticated peer-to-peer network partner transactions involving data exchange. It can support one of the big challenges of MDM with governed, bi-directional synchronization of master data between the blockchain and enterprise MDM.
Another core MDM challenge is arriving at the “single version of the truth”. It’s elusive even with MDM because everyone must tacitly agree to the process used to instantiate the data in the first place. While many MDM practitioners go to great lengths to utilize the data rules from a data governance process, it is still a process subject to criticism. The consensus that blockchain can achieve is a governance proxy for that elusive “single version of the truth” by achieving group consensus for trust as well as full lineage of data.
Blockchain enables the major components and tackles the major challenges in MDM.
Blockchain provides a distributed database, as opposed to a centralized hub, that can store data that is certified, and for perpetuity. By storing timestamped and linked blocks, the blockchain is unalterable and permanent. Though not for low latency transactions yet, transactions involving master data, such as financial settlements, are ideal for blockchain and can be sped up by an order of magnitude since blockchain removes the grist in a normal process.
Blockchain uses pre-defined rules that act as gatekeepers of data quality and governs the way in which data is utilized. Blockchains can be deployed publicly (like bitcoin) or internally (like an implementation of Hyperledger). There could be a blockchain per subject area (like customer or product) in the implementation. MDM will begin by utilizing these internal blockchain networks, also known as Distributed Ledger Technology, though utilization of public blockchains are inevitable.
A shared master data ledger beyond company boundaries can, for example, contain common and agreed master data including public company information and common contract clauses with only counterparties able to see the content and destination of the communication.
Hyperledger is quickly becoming the standard for open source blockchain. Hyperledger is hosted by The Linux Foundation. IBM, with the Hyperledger Fabric, is establishing the framework for blockchain in the enterprise. Supporting master data management with a programmable interface for confidential transactions over a permissioned network is becoming a key inflection point for blockchain and Hyperledger.
Data management is about right data at the right time and master data is fundamental to great data management, which is why centralized approaches like the discipline of master data management has taken center stage. MDM can utilize the blockchain for distribution and governance and blockchain can clearly utilize the great master data produced by MDM. Blockchain data needs data governance like any data. This data actually needs it more given its importance on the network.
MDM and blockchain are going to be intertwined now. It enables the key components of establishing and distributing the single version of the truth of data. Blockchain enables trusted, governed data. It integrates this data across broad networks. It prevents duplication and provides data lineage.
It will start in MDM in niches that demand these traits such as financial, insurance and government data. You can get to know the customer better with native fuzzy search and matching in the blockchain. You can track provenance, ownership, relationship and lineage of assets, do trade/channel finance and post-trade reconciliation/settlement.
Blockchain is now a disruption vector for MDM. MDM vendors need to be at least blockchain-aware today, creating the ability for blockchain integration in the near future, such as what IBM InfoSphere Master Data Management is doing this year. Others will lose ground.

Commvault updates its data management platform

Commvault, a data management company, is rolling out updates to many of its enterprise products today, with the aim of simplifying the process of actually using your company’s data.
Among the updates are new open APIs that promise not to lock in customer data, better data searching tools, and a new back up mechanism that only saves blocks of data that have been recently changed. The company has also improved its disaster recovery tools with a dashboard that can be used to handle “provisioning, management, retirement, and cross-platform migration [or] recoveries,” according to its senior product manager of data protection and recovery, Jonathan Howard.
This is the eleventh version of what Commvault calls its “solutions portfolio.” Existing customers will be able to upgrade to the new tools — or “seamlessly consume” them, in the company’s parlance — without hassle. (And if there’s anything people hate when it comes to buzzword-laden cloud platforms that allow them to “activate” and manage their data, it’s gonna be any hassle.) They will also be available to new customers, as is the way of new product releases.
Here’s what Howard had to say about the new product releases: “Our focus on open access, flexibility, and automation provide customers the ability activate their data in ways that were never before possible. This allows customers to reduce or eliminate backup windows, instantly access or recover workloads while providing automation to eliminate manual and complex operations.”
Commvault began as a development group in Bell Labs in 1988, and was a “strategic business unit” of AT&T Network Systems until it went solo in 1996.

NSA-linked Sqrrl eyes cyber security and lands $7M in funding

Sqrrl, the big data startup whose founders used to work for the NSA, plans to announce Thursday that it is shifting its focus to cyber security with a new release of its enterprise service. The startup is also taking in a $7 million Series B investment round, bringing its total funding to $14.2 million, said Ely Kahn, a Sqrrl co-founder and vice president of business development.

The heart of Sqrrl’s technology is the NSA-developed and open-sourced Apache Accumulo NoSQL database, which the company, like other open-source-reliant companies such as Docker or Hortonworks, sells premium services around.

While the Accumulo technology, based on Hadoop, provided a way for companies to store and analyze all their data similar to how they could with other big data vendors like Splunk, Kahn said his team found that their biggest customers were using the technology for cybersecurity purposes. Just a hunch, but I bet the whole “ties to the NSA” thing probably leads to people wanting to give it a go for their security challenges.

Sqrrl’s technology spools together many different types of data sets, from intrusion detection logs to human resources information, and puts that in a single platform that can be used for discovering bad actors that may be loitering in a company’s infrastructure.

Because the Accumulo NoSQL database can function as a graph database (graph databases are a class of NoSQL databases, said Kahn) the Sqrrl team can dump all that data into the system and then receive a picture of the network that contains all the users, devices and servers and how they are connected together.

Sqrrl dashboard

Sqrrl dashboard

“We are able to take all these disparate data sets and defuse them into this linked-data model,” said Kahn.

Graph databases seem to be getting a lot of action these days (DataStax just bought out a graph-database company called Aurelius) and it’s often that people use the technology as a way to map out their infrastructure and learn about vulnerabilities.

Given this traction of using graph databases for security purposes it makes sense that Sqrrl would want to ride this wave, and its Sqrrl Enterprise 2.0 product line now contains security specific features including a visualization tools like bar charts and pie charts, and a dashboard for users to create reports based from the data.

“It’s a big data analytics platform with a focus on cybersecurity,” said Kahn. “It has a database foundation, but it now has advanced visualization capabilities that supports the incident-detection lifecycle.”

This might sounds similar to Argyle Data, which built fraud-detection software on top of the Accumulo database, but Kahn said that startup is more focussed on using its technology to prevent telephone scams and the like and that solving problems related to fraud requires different types of data sets than the ones Sqrrl analyzes to detect anomalies.

Rally Ventures drove the latest funding round along with previous investors Atlas Venture and Matrix Partners.

For more on how innovative companies are using big data to solve complex problems, be sure to check out Structure Data 2015 on March 18-19 in New York City.

DataStax’s first acquisition is a graph-database company

DataStax, the rising NoSQL database vendor that hawks a commercial version of the open-source Apache Cassandra distributed database, plans to announce on Tuesday that it has acquired graph-database specialist Aurelius, which maintains the open-source graph database Titan.

All of Aurelius’s eight-person engineering staff will be joining DataStax, said Martin Van Ryswyk, DataStax’s executive vice president of engineering. This makes for DataStax’s first acquisition since being founded in 2010. The company did not disclose the purchase price, but Van Ryswyk said that a “big chunk” of DataStax’s recent $106 million funding round was used to help finance the purchase.

Although DataStax has been making a name for itself amid the NoSQL market, where it competes with companies like MongoDB and Couchbase, it’s apparent that the company is branching out a little bit by purchasing a graph-database shop.

Cassandra is a powerful and scalable database used for online or transactional purposes (Netflix and Spotify are users), but it lacks some of the features that make graph databases attractive for some organizations, explained DataStax co-founder and chief customer officer Matt Pfeil. These features include the ability to map out relationships between data points, which is helpful for social networks like Pinterest or [company]Facebook[/company] who use graph architecture to learn about user interests and activities.

Financial institutions are also interested in graph databases as a way to detect fraud and malicious behavior in their infrastructure, Pfeil said.

As DataStax “started to move up the stack,” the company noticed that its customers were using graph database technology, and DataStax felt it could come up with a product that could give customers what they wanted, said Pfeil.

DataStax Enterprise

DataStax Enterprise

Customers don’t just want one database technology, they want a “multi-dimensional approach” that includes Cassandra, search capabilities, analytics and graph technology, and they are willing to plunk down cash for commercial support, explained Van Ryswyk.

Because some open-source developers were already figuring out ways for both Cassandra and the Titan database to be used together, it made sense that DataStax and the Aurelius team to work together on making the enterprise versions of the technology compatible with each other, Van Ryswyk said.

Together, DataStax and the newly acquired Aurelius team will develop a commercial graph product called DataStax Enterprise (DSE) Graph, which they will try to “get it to the level of scalability that people expect of Cassandra,” said Van Ryswyk. As of now, there is no release date as to when the technology will be ready, but Pfeil said work on the new product is already taking place.

If you’re interested in learning more about what’s going on with big data in the enterprise and what other innovative companies are doing, you’ll want to check out this year’s Structure Data conference from March 18-19 in New York City.

Netflix is revamping its data architecture for streaming movies

Netflix is revamping the computing architecture that processes data for its streaming video service, according to a Netflix blog post that came out on Tuesday.

The [company]Netflix[/company] engineering team wanted an architecture that can handle three key areas the video-streaming giant believes greatly affects the user experience: knowing what titles a person has watched; knowing where in a given title did a person stop watching; and knowing what else is being watched on someone’s account, which is helpful for family members who may be sharing one account.

Although Netflix’s current architecture allows the company to handle all of these tasks, and the company built a distributed stateful system (meaning that the system keeps track of all user interaction and video watching and can react to any of those changes on the fly) to handle the activity, Netflix “ended up with a complex solution that was less robust than mature open source technologies” and wants something that’s more scalable.

Netflix’s current architecture looks like this:

Netflix architecture figure

Netflix architecture figure

There’s a viewing service that’s split up into a stateful tier that stores the data for active views in memory; Cassandra is used as the primary data store with the Memcached key-value store built on top for data caching. There’s also a stateless tier that acts as “a fallback mechanism when a stateful node was unreachable.”

This basically means that when an outage occurs, the data stored in the stateless tier can transfer over to the end user, even though that data may not be exactly as up-to-date or as relevant as the data held in the stateful tier.

In regard to caching, the Netflix team apparently finds Memcached helpful for the time being, but is looking for a different technology “that natively supports first class data types and operations like append.”
From the blog post:
[blockquote person=”Netflix” attribution=”Netflix”]Memcached offers superb throughput and latency characteristics, but isn’t well suited for our use case. To update the data in memcached, we read the latest data, append a new view entry (if none exists for that movie) or modify an existing entry (moving it to the front of the time-ordered list), and then write the updated data back to memcached. We use an eventually consistent approach to handling multiple writers, accepting that an inconsistent write may happen but will get corrected soon after due to a short cache entry TTL and a periodic cache refresh.[/blockquote]

Things got a bit more complex from an architecture perspective when Netflix “moved from a single AWS region to running in multiple AWS regions” as the team had “to build a custom mechanism to communicate the state between stateful tiers in different regions,” which obviously means having to keep track of a lot more moving parts.

For Netflix’s upcoming architecture overhaul, the company is looking at a design that accommodates these three principles: availability over consistency; microservices; and polyglot persistence, which means having multiple data storage technologies to be used for different, specific purposes.

The new system will look something like this:

Netflix future architecture

Netflix future architecture

There’s not a lot of information as to what exactly will be the technologies that comprise this new architecture, but Netflix said it will be following up in a future post with more details. Judging by the picture of the eye in the diagram, it looks like Cassandra will still be one of them.

MongoDB confirms an $80M funding round

NoSQL startup MongoDB is aiming to raise $100 million and has already taken in $79.9 million, according to a SEC document that the company filed this week and has confirmed to Gigaom.

The new cash influx comes after a $150 million funding round the startup landed in October 2013 when the company was then valued at $1.2 billion.

MongoDB is a hot commodity in the NoSQL database space, where it competes with Couchbase and DataStax, among others. In their last investment rounds, Couchbase and DataStax have raised $60 million and $106 million, respectively.

MongoDB has also been figuring out how to make money as a company that’s built around open source software. In October, MongoDB unveiled its MongoDB Management Service, designed to help users scale and manage their databases; the startup is banking that the new service will generate a lot of revenue. It also added paid support (or what it calls “production support”) for users of the free version in August, and brought in a new CEO with IPO experience the same month.

The startup recently bought out WiredTiger, whose storage engine technology should be available as an option for a forthcoming MongoDB release. Financial terms of the acquisition were not disclosed.

With Hadoop vendor Hortonworks recently going public with a market cap of a little over a billion dollars, it’s clear the big data space is on fire and investors aren’t scared off by open source software. MongoDB has indicated that it eyes an IPO in its future, but this new funding round will give it leeway to find an optimal timeframe.

In October, MongoDB’s vice chairman and former CEO Max Schireson came on by the Structure Show to chat about databases as well as managing a family while trying to lead a fast-rising startup.

[soundcloud url=”″ params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

Can’t find your market research data? This startup can help

That Diet Coke in your hand didn’t invent itself. It’s the result of years of focus groups, surveys and data crunching—more commonly known as market research. Companies that produce consumer packaged goods (CPGs) spend millions each year on market research and it’s both a blessing and a curse. It’s a boon to planning new products and tweaking existing ones but when it comes to finding individual pieces of information, that’s where the cursing comes in. A startup out of Chicago called KnowledgeHound thinks it has the answer.

The majority of product failures can be attributed to a brand not knowing or understanding its audience. When you’re in the business of selling to consumers, getting inside their heads — and then using that information effectively — can make or break you. At most companies, that information lies undiscovered in spreadsheets, survey responses, and charts that were never parsed. As KnowledgeHound CEO Kristi Zuhlke put it, companies spend all that cash on studies, then “stash the results on a hard drive, and promptly experience ‘corporate amnesia’.”

Zuhlke experienced this firsthand in her time at Procter & Gamble working in consumer insights. “As soon as we’d get the info in, a month later we’d forget we had it. There wasn’t a central, easily navigated place to access.” Building on her experiences in the consumer packaged goods industry, Zuhlke founded KnowledgeHound in 2012 and went from paper to product in four months. Her first customer? Procter & Gamble.

Calling themselves “the Google of market research,” KnowledgeHound’s technology enables Fortune 500 companies to use, re-use, and recycle the consumer and market knowledge on which they spend an average of $60,000 per study. Instead of living on individual hard drives, which makes it unsearchable to the company at large, the data and research from each study is imported into the KnowledgeHound database, then augmented with a custom search engine and visualization tools. Results are modeled on a right brain/left brain approach when presented to the user, with research summaries representing the right and data points the left.


kh shot1


“Where we’re really different is the left side of the brain, the data points,” Zuhlke said. The search engine looks through questions as they were asked in the study, takes the raw data, and transforms it into graphs and charts in milliseconds.


kh shot2

KnowledgeHound’s product puts it in competition with a variety of markets—data visualization tools, business intelligence software, and enterprise search. Entering into just one of these would be a tall order. The bulk of competitors provide general document searching that can be used for content, but few focus exclusively on the area of market research and its emphasis on data. One company that does jump out as a strong competitor is InfoTools out of New Zealand, which focuses so singularly on market research tools, they created awards around them (DIVAs: Data Insight Visualization Awards.)

Otherwise, the choice of tools to parse a company’s market research is all over the map. There is a thriving DIY community around visualization that focuses more on the research agencies that produce the studies. This puts the onus on the producer of the data to present findings in the way they think the client will need; a short-sighted solution that likely comes up short once it makes its way to the client.  And occasionally, you’ll find a CPG company with a corporate librarian to help employees find the data they’re seeking. But those are rare.

Angel-funded but also generating revenue, KnowledgeHound aims to hire 10-12 more employees in 2015 to help add to its customer base – Fortune 500, privately held, billion dollar companies – in addition to moving into the medical research space and launching new technologies in the next 6-9 months.

Market research around CPGs is not going to be the next big thing in Silicon Valley. It’s wonky, niche-y, and involves a lot of dry data. But its target market is one of the largest industries in North America, valued at approximately $2 trillion. KnowledgeHound seems poised to take the lead, if it can attract the talent and resources needed to keep billion-dollar brand giants happy.

MemSQL open sources tool that helps move data into your database

Database startup MemSQL said today that it open sourced a new data transfer tool called MemSQL Loader that helps users haul over vast quantities of data from sources like Amazon S3 and the Hadoop Distributed File System (HDFS) into either an MemSQL or MySQL database.

While moving data from one source to another may seem relatively straightforward, there’s a lot of nuts and bolts in the process; if one thing goes awry, the whole endeavor can fail. For example, if you’re trying to move over thousands of files and one fails to transfer for some reason, you may have to start the process over again and hope all goes well, according to the MemSQL announcement.

MemSQL Loader is essentially an automation tool that lets users set up multiple transfers and queues that can restart “at a specific file in case of any import issues,” the release stated.

From the MemSQL blog post explaining the tool:
[blockquote person=”MemSQL” attribution=”MemSQL”]MemSQL Loader lets you load files from Amazon S3, the Hadoop Distributed File System (HDFS), and the local filesystem. You can specify all of the files you want to load with one command, and MemSQL Loader will take care of deduplicating files, parallelizing the workload, retrying files if they fail to load, and more.[/blockquote]

MemSQL in action

MemSQL in action

The new tool is available in open source through the MIT License and can be downloaded at GitHub.

MemSQL has been on a roll launching new tools and features since its 2012 inception. In September, Gigaom’s Derrick Harris reported that MemSQL now supports cross-data-center replication, which is good for disaster recovery in case a database takes a hit; cross-data-center replication also helps distribute the load across two data centers, which could cut down on latency and boost performance.