The logical data warehouse works by intelligently marrying two distinct technologies to create an entirely new manner of integrating data. The first technology is data federation, which connects two or more disparate databases and makes them all appear as if they were a single database. The second is analytical database management providing semantic business-friendly data element naming and modeling allowing flexible ingestion and modeling options.
A modern data integration strategy employs what’s known as “best-fit engineering,” whereby each part of the data management infrastructure utilizes the most appropriate technology solution to perform its role, including storing data determined by business requirements and service-level agreements (SLAs). Unlike a data lake, this new architecture has a distributed approach, aligning information storage selection, with information use, and leveraging multiple data technologies that are fit for specific purposes. A hybrid approach can also significantly reduce costs and time to delivery when changes or additions in the warehouse are required.
A new technology arises, the data lake strategy. Data lakes are storage repositories, which are able to hold a vast amount of raw data in its native format until needed. In many cases data lakes are Hadoop-based systems and they represent the next stage in both power and flexibility. A compelling benefit of the approach is that there is no need to structure (transform) the data before querying it (which would be referred to as ‘schema on write’). In fact, you can assign structure to the data at the time it is being queried (referred to as ‘schema on read’). However, while data lakes are able to hold large amounts of unstructured data in a cost-effective manner, they are insufficient for interactive analysis when fast query response is required or if access to real-time data is needed.
Online Analytical Processing (OLAP), and cubes are other words for multi-dimensional sets of data that essentially serve as a staging space in which to analyze information. These special online analytic processing databases hold data not in tables but in OLAP cubes which are a mechanism used to store and query data in an organized, multi-dimensional, structure specifically optimized for analysis.
Big data is here, and it’s transforming the very nature of commerce, enabling new insights, and accelerating the generation of business insights. While the concept of big data isn’t new, its potential is just now being realized as powerful tools to organize, manage, and analyze, immense volumes of enterprise-generated and third-party data finally become available for mainstream use.
However, for many organizations, it’s not so easy to unlock the value in this data. While data volume (the amount of data) and velocity (speed that data is generated) is in part what makes it so valuable, volume and velocity also present significant challenges. Still more daunting is the broad variation in the types and sources of data (variety), including highly structured files, semi-structured text, and unstructured video and audio feeds.
The proliferation of disparate data sources distinguishes today’s data landscape. Easily accessible, well-structured data was once the norm. The “status quo” has been disrupted by the phenomenal growth in the variety and volume of multi-structured data originating from machine and IoT, external, application-oriented, cloud-based, and on-premises sources. Emerging in the wake of this digital disruption is a data-centricity shared by businesses ranging in scale from fast-growing SMB’s to global ecommerce enterprises.