While the majority of data analysts were busy exploring the progression from relational databases to Cubes, analytic databases, and data lakes, another camp was looking into using data federation to integrate data for analysis.
A New Approach
Data federation allows analysts to instantly run queries joining multiple disparate databases without the need to copy or move data from the original operational sources to a central analytical repository. This approach is clearly a significant improvement on all of its predecessors regarding the immediacy at which data can be analyzed. While the idea is sound and value is self-evident, data federation alone isn’t scalable for large amounts of data or for large numbers of simultaneous users. In addition, because it relies heavily on the speed and stability of the source systems and network, its performance is commonly diminished for both data analysis and production operations. So, while data federation is quick and flexible, in itself it is not scalable or particularly dependable. But, it was an important step in the right direction.
The next stage of evolution was to combine data federation with caching repositories to address these issues. This hybrid approach used big data solutions to complement data warehousing. The result is a combination of repositories, virtualization, and distributed processes for data management that delivers the best capabilities from several technologies but still falls short of the expectation for a robust, agile, performant data warehouse. Caching can be problematic due to the need to schedule cache loads around performance concerns of source systems and that the cache is loaded into a single repository that may or may not be optimized for different data sets and/or data types.
Still, in moving closer to modern data warehouses, virtual data technology is essential—from simple federation to virtualization, as well as virtual views, indices, and semantics. Developing virtual or logical data views is faster than relocating all data physically and can be done with ease through point and click operations. In addition, virtual views can be altered without the need to transform and reload data, as in earlier data warehouse integration approaches meaning the changes can be presented live immediately, without waiting for the data to populate through an overnight process. It is the virtualization of data integration that enables extreme agility in analytical development and significantly reduces build times and costs, all of which leads us to the next breakthrough in data warehousing.
Advantages and Disadvantages of Data Federation
Learn more about how data federation is being used in modern data integration. Get your free eBook now.