Data integration becomes a much more important issue with the use of both public and private cloud-based platforms. Information that was once stored locally may now be spread across several public cloud providers, and that information needs to be shared with most, if not all, enterprise systems that exist locally or within public clouds.
These days, most cloud computing-related data integration focuses on the simple process of replicating data from a public cloud to an on-premise system, as well as cloud-to-cloud. This typically uses data integration technology as a way to synchronize enterprise data between traditional on-premise systems, to those that have relocated to the cloud. Examples include sales data that resides in a SaaS-delivered CRM, and perhaps delivery systems that reside within the enterprise data center. The data integration software works to make sure that the sales data, and other information, matches in both systems.
What’s most interesting about the path of data integration and the rapidly growing cloud computing space? The traditional approach to data integration, meaning simple data replication and mediation, will become an outdated concept over the next 10 years or so. Data integration technology will make rapid moves in new directions, in light of the needs and the value of cloud computing, which will provide the enterprise with new capabilities, as well as largely disrupt the existing data integration marketplace. Also keep in mind that some data integration technologies will reach their 20th birthday in the next few years.
What you think you know and understand about data integration could quickly go out the window, as technology providers try to match the capabilities of newer cloud-based platforms with newer data integration strategies, approaches, and technologies. Reconsider what you understand about data integration and cloud to provide an emerging path to success for existing and new data integration technology providers.
Data integration evolves
The evolution of data integration largely began in the middle 90s with the Enterprise Application Integration movement. This architectural pattern was in response to the number of enterprise systems, such as SAP and Peoplesoft, that bgan to appear in data centers, and thus the need to synchronize that information with other systems within the enterprise.
The older patterns of data and application integration were fairly simple to understand. Information was extracted from the source system(s), it was then changed in structure and content (typically), and placed in the target system(s). This usually occurred around events such as adding a new customer in the accounting system, or updating the current status of inventory.
Players in the traditional data and application integration market, which still exists today, includes Tibco, IBM, Software AG, Pervasive/Actian, and Informatica. Most have modernized their integration offerings, including cloud-based sources and targets, as well as adding some as-a-service capabilities (cloud delivered). This is not to say that the more mature companies provide better approaches to cloud-based data integration; they have just been around longer.
Somewhat newer players include WSO2, Red Hat, Jitterbit, Boomi/Dell, Composite/Cisco, Castiron/IBM, Liaison Technologies, Scribe, and many others. These technologies came in the form of a second wave of data integration technologies that were designed to address the changing needs of enterprise data integration. This wave includes some newer approaches such as data virtualization or abstraction (such as Red Hat, WSO2, and Informatica, and Composite/Cisco.), or systems that operate exclusively on-demand (such as Boomi).
As seen in Figure 1, the focus now (2010 – 2014) is on leveraging existing integration technology, traditional and not, that largely provides some, if not all of the capabilities that include: Data replication, semantic mediation, data cleansing, and mass data migration. These capabilities provide enterprises with the ability to move data from cloud-to-cloud, cloud-to-enterprise, or intra-enterprise, as needed, to support the core business processes. They have evolved in the last several years around the trends in hybrid and multi-cloud architectures, as well as the rise of much larger data sets (big data).
Figure 1: As cloud technology matures, data integration will take on new forms, new roles, and will deliver new value.
Common features include the ability to account for the differences in the ways that the cloud or non-cloud systems store data, and the ability to change both the structure and the content on the fly, thus making the target system think it’s receiving native data. Mass data migration deals with the ETL (extract – translate – load) capabilities and includes the ability to migrate large amounts of data at specified times of the day or week, also changing the content and structure to meet the needs of the target system, such as a data warehouse in the cloud or on-premise.
The use of data cleansing technology means that the data integration solution can deal with data issues, which include removing or correcting corrupt or inaccurate data sets. This is done along with the other data integration operations, typically while information moves from system to system, such as from an existing on-premise inventory control system to a cloud-based database.
Things need to change
Referring again to Figure 1, as cloud computing becomes a larger part of enterprise platforms, the world of data integration will need to evolve as well. This includes adding or expanding some data integration capabilities, such as:
- Intelligent Data Service Discovery
- Data Virtualization
- Data Orchestration
- Data Identity
Intelligent data service discovery refers to data integration technology that can automatically find and define data services that become the primary mechanism to consume and produce data from existing cloud and non-cloud systems. This means that we can discover and re-discover which data services exist within the enterprise, and, more importantly, which data services exist in public clouds, noting where they are, what they do, and how to access them. Enterprises will leverage this catalogue as a means to understand all available data assets, and thus leverage the most meaningful data assets to support core business processes, owned or rented.
Data virtualization, which is really not new, will grow in popularity as enterprises attempt to redefine existing databases using new virtual structures, and externalize these databases as well-defined data services. A new virtual database structure can be placed over existing database(s), and thus redefine how the database is made visible and consumed by other systems. There is no need to rebuild the back-end databases to meet the new needs of cloud-based systems, which removes risk and cost.
Data orchestration refers to the ability to define how the data interacts together to form and reform solutions. Much like service orchestration, this means defining composite data points, perhaps combining sales and customers, to form new data services that can be leveraged inside or outside of the enterprise. Those who leverage the data will have a greater degree of control over what the data means for each application view, keeping the physical structure and data content intact.
Data identity refers to the ability to link data, both structure and data instances, to humans or machines. This controls who or what can consume the data, and see the contents. This makes it easier to live up to ever changing and expanding regulations, and even internal data security policies. The data containers control access to the data, and the data identity rules that are set within the data. This becomes a common mechanism that spans enterprises, and public cloud providers.
Consider the rise of shared enterprise business services that will likely emerge, given the number of business systems that will exist on public cloud-based platforms. The ease with which these services can be share will also see the rise of the ability to reuse data. This will create the ability to repurpose any data sets, including structure and content, for use in new and existing systems, without having to create a new database or data service instance. An example would be the ability to integrate an historical sales database, perhaps from another enterprise that makes it available, to understand patterns of fraud for a newly built system. You don’t need to understand anything about the reused data set; it self-defines within your new use case.
Much like data identity, credential-based identity and centralized trust is the next generation that will allow data to define access credentials within the data. This just takes data identity to the next level by providing a centralized location that can validate both the data (structure and content), as well as humans and machines that would like to view or manipulate the data. This mechanism means that we’ll understand where all data exists, and match those authorized with the authorized data, down to the database, object, and instance. Again, this assumes a universal standard.
Of course, predictions are not a perfect science. However, looking at the clear and emerging patterns of cloud-based migration and development, as well as the trajectory of data management and data integration, these are pretty easy calls.
Data integration providers will need to keep up with the evolving requirements, and I’m sure that the data integration technology space will look very different in just 5 years time. Those who prepare for the changes will survive and thrive. Those who do not prepare will find themselves yet another victim of the changes that cloud computing brought to the world.