CData Acquires Data Virtuality, Modernizing Enterprise Data Virtualization

Article contents

Share Article

Performance Optimization enabled in the Data Virtuality Platform

Article contents

One of the predominant concerns expressed by enterprises considering data virtualization revolves around performance. Given its inherent nature of establishing real-time connections, such worries are entirely justifiable.

To address this, Data Virtuality Platform has integrated a three-level performance optimization mechanism, a step ahead of the commonly found two-level optimization in other data virtualization tools.

The three levels are:

  1. Distributed query optimization: This leverages advanced techniques like pushdowns and optimized join algorithms.
  2. Caching: This functions both in-memory and on-disk.
  3. Self-learning recommended optimization (Materialization): This proposes the materialization of tables/views for enhanced optimization.

In the following, we will delve deeper into the different layers and comprehend how they bolster performance.

Distributed query optimization

All queries entering the Data Virtuality engine undergo transformations to enhance their performance using distributed query optimization. Here’s a breakdown of the primary processes involved: 

  • Rewriting SQL: As a foundational step, queries undergo a refinement process to simplify expressions and criteria, ensuring that the base SQL is optimized for maximum efficiency.
  • Logical plan optimization: Once the SQL is rewritten, the queries are turned into a logical plan. The Data Virtuality Server uses special optimization rules to look closely at the query’s structure and the size of the data. It also considers detailed cost information to improve its decisions, helping to use techniques like pushdowns.
  • Processing plan conversion: Subsequent to the logical plan optimization, this plan is transmuted into an actionable format. Within this layout, nodes symbolize fundamental processing actions, steering the query’s execution across the distributed framework.

Caching

Recognizing the constraints of scalability in data virtualization, especially with expansive datasets or a high number of users, Data Virtuality taps into caching to improve query performance. Caching significantly boosts performance for small datasets, yet its effectiveness for larger datasets diminishes rapidly, providing limited control over data loading and storage.

Self-learning recommended optimization (Materialization)

The distinctive part of Data Virtuality Platform’s optimization engine is data materialization with self-learning capabilities. It learns from the query behavior of data consumers and addresses performance issues by autonomously creating and managing the physical data structures of either: 

  1. the external data sources or
  2. the internal virtual views in user-defined analytical storage

Further, this self-learning recommendation optimization suggests indexes for the materialized tables. Once data is physically stored in the analytical storage, any slow-performing segments of a query are seamlessly redirected to this optimized data, eliminating the need for report rewriting.

To ensure the data in analytical storage remains updated, periodic materialization tasks are executed. Incremental materializations, which capture only the new or changed data, are also on offer, thereby reducing the amount of data to be materialized.

The advanced data virtualization experience

Data virtualization is a dynamic technology, and performance optimization is crucial for enterprises to leverage its full potential. The Data Virtuality Platform’s three-tiered approach to performance optimization ensures a comprehensive solution, addressing multiple aspects of the performance challenges. Whether you’re dealing with large datasets, numerous users, complex query structures, or slow databases and/or slow network, the platform’s materialization capabilities and optimization features are designed to maximize efficiency. 

Experience the power and innovation of the Data Virtuality Platform firsthand – start your free trial today.

For more details on Performance Optimization in the Data Virtuality Platform you can check out the documentation.

More interesting articles

Data Virtuality brings enterprise data virtualization capabilities to CData, delivering highly-performant access to live data at any scale.
Discover how integrating data warehouse automation with data virtualization can lead to better managed and optimized data workflows.
Discover how our ChatGPT powered SQL AI Assistant can help Data Virtuality users boost their performance when working with data.
While caching offers certain advantages, it's not a one-size-fits-all solution. To comprehensively meet business requirements, combining data virtualization with replication is key.
Explore the potential of Data Virtuality’s connector for Databricks, enhancing your data lakehouse experience with flexible integration.
Generative AI is an exciting new technology which is helping to democratise and accelerate data management tasks including data engineering.