CData Software Acquires Data Virtuality to Modernize Data Virtualization for the Enterprise
Data Virtuality brings enterprise data virtualization capabilities to CData, delivering highly-performant access to live data at any scale.
Explore how you can use the Data Virtuality Platform in different scenarios.
Learn more about the Data Virtuality Platform or to make your work with Data Virtuality even more successful.
Insights on product updates, useful guides and further informative articles.
Find insightful webinars, whitepapers and ebooks in our resource library.
Stronger together. Learn more about our partner programs.
Read, watch and learn what our customers achieved with Data Virtuality.
In our documentation you will find everything to know about our products
Read the answers to frequently asked questions about Data Virtuality.
In this on-demand webinar, we look at how a modern data architecture can help data scientists to be faster and to work more efficiently.
Learn more about us as a company, our strong partner ecosystem and our career opportunities.
How we achieve data security and transparency
Data virtualization has become a vital component in the data landscape, offering agility and quick adaptation in today’s fast-paced business environment. However, when deployed in isolation (as a sole data integration solution), it faces several challenges:
To mitigate these issues, caching is commonly implemented in data virtualization solutions. It serves to:
Use cases best suited for data virtualization with caching are:
Despite its benefits, even data virtualization with caching has limitations:
The TTL logic in caching systems is based on the staleness of data, where the cache refreshes when the data becomes outdated. This approach, however, can lead to conflicts with the peak operational periods of source systems, inadvertently increasing their load.
This issue underscores the need for refresh mechanisms more aligned with the operational rhythms and demands of the source systems, beyond the scope of traditional caching methods. Recognizing this limitation, some vendors offer solutions that allow users to manually schedule cache refreshes, integrating elements of replication logic into their caching strategy. However, this hybrid approach is not always seamlessly integrated into the core data virtualization solutions, potentially leading to increased complexity in management and operation.
Caching, while beneficial for certain scenarios, falls short in effectively managing large data sets. Its inherent nature of caching is often designed to be smaller in size than the original data and limits its effectiveness in scenarios that require comprehensive analytical operations across entire datasets:
Data virtualization, even when supplemented with caching, struggles with complex data integration tasks. Examples include importing flat files from FTP, matching and cleansing customer address sets from different source systems, and tracking changes in employee responsibilities. These operational business requests require robust data storage and transformation capabilities.
The limitations of data virtualization with caching highlight the need for a more comprehensive solution. To meet the full spectrum of business needs, integrating data virtualization and data replication is essential. This approach includes methodologies such as ETL, ELT, and CDC.
A mix of data integration approaches has remained crucial, spanning from physical delivery to virtualized delivery, and from bulk/batch movements to event-driven granular data propagation. When data is being constantly produced in massive quantities and is always in motion and constantly changing (for example, IoT platforms and data lakes), attempts to collect all this data are neither practical nor viable. This is driving an increase in demand for connection to data, not just the collection of it.
– Gartner, Magic Quadrant for Data Integration Tools 2023 –
Advanced data materialization1 and replication2 enable precise control over update schedules, allowing them to be tailored according to the specific needs of the source systems. This adaptability is crucial in managing the load on these systems efficiently. In certain cases, an even more effective strategy is to replicate data directly from the sources using Change Data Capture (CDC) techniques. CDC allows for capturing changes in real-time or near-real-time, thereby reducing the load on source systems and ensuring that the data in the virtualization layer is as up-to-date as possible.
Replication, used in conjunction with data virtualization, significantly enhances performance and scalability. By replicating data, especially the performance-intensive queries can operate more efficiently in data virtualization systems. This replication can take various forms, such as caching, materialization in a database, or direct replication, depending on the specific use case and performance requirements. Additionally, the creation of indexes, distribution keys, and transparent aggregations further optimizes performance, making the system more scalable and capable of handling large volumes of data.
The integration of data virtualization with replication greatly facilitates complex data management tasks such as data transformation, historization, management of slowly changing dimensions, and data cleansing. These functionalities are often achieved through ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform)3 processes. Moreover, addressing Master Data Management (MDM) challenges, such as data cleansing and field normalization, becomes feasible with centralized data storage. The use of Procedural SQL4 within the data integration platform allows for direct harmonization of master data, further enhancing the quality and consistency of the data managed within the system.
Overall, the discussion of data virtualization’s shortcomings, particularly in the context of analytical use cases, might create an impression that it is inherently flawed and should be avoided. However, this is not the case. The key lies in understanding the pros and cons of the various integration styles, and ideally, this understanding should be a guiding factor in building your data architecture and in your journey of selecting a solution.
That’s why the Data Virtuality Platform integrates different data integration styles. This integration ensures that organizations don’t need to worry about individual shortcomings, allowing them to work most efficiently and capitalize on the strengths of these diverse styles. Such flexibility is crucial in today’s world, where business demands are constantly evolving at a rapid pace.
Interested in exploring the full potential of data virtualization for your business? Try the Data Virtuality Platform for 30 days, absolutely free. Or click here to book a demo for a more personalized walkthrough of our solution.
Data Virtuality brings enterprise data virtualization capabilities to CData, delivering highly-performant access to live data at any scale.
Discover how integrating data warehouse automation with data virtualization can lead to better managed and optimized data workflows.
Discover how our ChatGPT powered SQL AI Assistant can help Data Virtuality users boost their performance when working with data.
Explore the potential of Data Virtuality’s connector for Databricks, enhancing your data lakehouse experience with flexible integration.
Generative AI is an exciting new technology which is helping to democratise and accelerate data management tasks including data engineering.
Learn about the three-level performance optimization mechanism in the Data Virtuality Platform and how it can help you achieve maximum efficiency.
Leipzig
Katharinenstrasse 15 | 04109 | Germany
Munich
Trimburgstraße 2 | 81249 | Germany
San Francisco
2261 Market Street #4788 | CA 94114 | USA
Follow Us on Social Media
Our mission is to enable businesses to leverage the full potential of their data by providing a single source of truth platform to connect and manage all data.