A short guide on data replication types and use cases. Batch Replication

Published on October 28, 2022

We’re continuing our series on data replication types. In this blogpost, we’re talking about batch replication and its use cases.

What is batch replication?

Batch Replication Data Virtuality Platform

The purpose of Batch Replication is to append all data from the source view/table to the target table with the option of preventing duplicate records. Batch replication is easier to implement than incremental replication (but less flexible) and more powerful than complete replication. It also differs from both them in a numerous ways:

Old rows in the target table can be kept even if they were deleted in the source table
Updates can be performed on current rows, and inserted rows will be added to the target table
No timestamp columns are needed to get new rows because all rows will be retrieved from the source
You can decide whether or not to allow duplicates in case there are updates on the rows
You can also decide how to name the target table, and there will be no different stages to distinguish, just one table containing the data

Batch Replication without identity field

Use Case for Batch Replication

The option to delete duplicate records comes in handy if you work with an identity field and don’t want to track the changes. For instance, you have a list of customers with various information such as name, age, CustomerID, and date of birth. You want to update all information except for the age column because it is not important for you. With batch replication you can exactly do that.

If on the other hand you are working with financial transactions you might not want to delete any old information when you copy new information to the database so you can track the history of a particular account. This is again something that you can do through batch replication.

Batch Replication on the Data Virtuality Platform

On the Data Virtuality Platform, you can do batch replication in two ways.

Without the use of an Identity Field:

When you don’t use an identity field, all data from the source view/table is appended to whatever exists in the target table. This also happens each time the job runs.
This method is only useful if the data in the source table or view always has updates or new information (for example today’s Google Analytics data). Otherwise, there will be a lot of replicated rows in the target table and it would be hard to use this data in a meaningful manner.

With the use of an Identity Field

To avoid the duplicates mentioned above in the first case, a column with an unique identifier (Primary Key) needs to be determined. This will help you to keep a clear record of the changes, whilst also allowing for new rows to be added.
The process with the use of an Identity Field works differently now.
- On the first run, the whole table is retrieved from the source with the complete data.
- On subsequent runs, all data which doesn’t have a unique Identity Field is deleted from the target table and not from the source table because otherwise the data in the analytical storage will be stale.
- This ensures that all retrieved data can be added to the target table without creating duplicates.

How can Data Virtuality help you?

Our platform uniquely combines data virtualization and data replication. This gives data teams the flexibility to always choose the right method for their specific requirement. It provides high-performance real-time data integration through smarter query optimization and data historization as well as master data management through advanced ETL in a single tool.

Book a demo

Solutions

Overview

By Use Case

By Industry

Resources

Blog

Resource Center

Partners

Case Studies

Docs & Support

FAQ

Company

About Us

Careers

Data Virtuality Trust Center

Article contents

Share Article

A short guide on data replication types and use cases. Batch Replication

Article contents

What is batch replication?

Use Case for Batch Replication

Batch Replication on the Data Virtuality Platform

Without the use of an Identity Field:

With the use of an Identity Field

How can Data Virtuality help you?

More interesting articles

CData Software Acquires Data Virtuality to Modernize Data Virtualization for the Enterprise

Enhancing Data Management with Data Warehouse Automation in Data Virtualization

Explore the Data Virtuality SQL AI Assistant

Enhancing Data Virtualization with Replication – A Comprehensive Strategy for Data Architects

Leveraging Data Virtualization in Data Lakehouse Architecture

Generative AI in Data Integration