In the realm of Data Virtualization, one of the predominant concerns expressed by enterprises revolves around performance. Given its inherent nature of establishing real-time connections, such apprehensions are entirely justifiable.
To address this, Data Virtuality Platform has integrated a three-level performance optimization mechanism, a step ahead of the commonly found two-level optimization in other data virtualization tools.
The three levels are:
- Distributed query optimization: This leverages advanced techniques like pushdowns and optimized join algorithms.
- Caching: This functions both in-memory and on-disk.
- Self-learning recommended optimization (Materialization): This proposes the materialization of tables/views for enhanced optimization.
In the following, we will delve deeper into the different layers and comprehend how they bolster performance.
Distributed query optimization
All queries entering the Data Virtuality engine undergo transformations to enhance their performance using distributed query optimization. Here’s a breakdown of the primary processes involved:
- Rewriting SQL: As a foundational step, queries are revamped to declutter expressions and criteria, ensuring that the base SQL is streamlined for maximum efficiency.
- Logical plan optimization: Once the SQL is rewritten, the queries are turned into a logical plan. The Data Virtuality Server uses special optimization rules to look closely at the query’s structure and the size of the data. It also considers detailed cost information to improve its decisions, helping to use techniques like pushdowns.
- Processing plan conversion: Subsequent to the logical plan optimization, this plan is transmuted into an actionable format. Within this layout, nodes symbolize fundamental processing actions, steering the query’s execution across the distributed framework.
Caching
Recognizing the constraints of scalability in data virtualization, especially with expansive datasets or a high number of users, Data Virtuality taps into caching to improve query performance. While caching amplifies performance for small datasets, its efficacy remains fleeting. It often falls short for larger datasets, providing limited control over data loading and storage.
Self-learning recommended optimization (Materialization)
The distinctive part of Data Virtuality Platform’s optimization engine is data materialization with self-learning capabilities. It learns from the query behavior of data consumers and addresses performance issues by autonomously creating and managing the physical data structures of either:
- the external data sources or
- the internal virtual views in user-defined analytical storage
Further, this self-learning recommendation optimization suggests indexes for the materialized tables. Once data is physically stored in the analytical storage, any slow-performing segments of a query are seamlessly redirected to this optimized data, eliminating the need for report rewriting.
To ensure the data in analytical storage remains updated, periodic materialization tasks are executed. Incremental materializations, which capture only the new or changed data, are also on offer, thereby reducing the amount of data to be materialized.
The advanced data virtualization experience
Data virtualization is a dynamic technology, and performance optimization is crucial for enterprises to leverage its full potential. The Data Virtuality Platform’s three-tiered approach to performance optimization ensures a comprehensive solution, addressing multiple aspects of the performance challenges. Whether you’re dealing with large datasets, numerous users, complex query structures, or slow databases and/or slow network, the platform’s materialization capabilities and optimization features are designed to maximize efficiency.
Experience the power and innovation of the Data Virtuality Platform firsthand – start your free trial today.