CData Acquires Data Virtuality, Modernizing Enterprise Data Virtualization

Article contents

Share Article

How To Query MongoDB Databases With Standard SQL Statements

Article contents

Combining a MongoDB collection with a regular relational database in order to assess it comes with a lot of challenges, especially when it comes to how much time you need to spend. In this post we will show you an easier way to query MongoDB databases using Data Virtuality.

 

Motivation

The world of databases in general, and of Big Data analysis in particular, largely make use of the paradigm introduced by NoSQL. One of its major achievements is to allow for a more flexible approach in terms of data structure. A widely known representative of this approach is MongoDB (Mongo = humongous). NoSQL, which is an abbreviation for ‘Not only SQL’, offers the possibility to have data sets (collections) that consist of entries which might have a few properties in common. But each also has some extra properties that are important to it and are stored together. Here is a quick example of what different entries from the same collection could look like:

 

Query MongoDB Databases With Standard SQL Statements

 

Unfortunately, this flexibility comes with two trade-offs when implementing and actually using MongoDB. First, there is the problem of incompatibility with ANSI-SQL and its derivatives. Clearly, if there aren’t any tables and not all entries from a collection have a specific property then you cannot use a typical SQL syntax:

Query MongoDB Databases With Standard SQL Statements

 

But a more serious drawback is the hardship you have to undergo if you want to combine a MongoDB collection with a regular relational database and evaluate it. This is, strictly speaking, possible but it requires a lot of time because special processes are needed that allow for transformations of the given NoSQL database so that you have a common data model, which suits your needs. So how can you query MongoDB Databases with standard SQL statements in an easier way

Get Started with Data Virtuality

Your organization, your deployment choice.

 

Data Virtuality – The Saving Grace

All the manual processing is not necessary anymore if you use the Data Virtuality Server. Through the coherent structure of tables and the sophisticated transformation that Data Virtuality performs automatically, MongoDB can easily be accessed and queried with just regular SQL statements, and the content can be combined with other data sources, such as relational databases. Apart from connecting to the NoSQL source and waiting for the structure to be detected by Data Virtuality, no more steps are necessary. Below are the steps you take to be able to query MongoDB databases with Standard SQL Statements.

 

Step 1 – Connecting to the MongoDB Data Source

Adding MongoDB is just as easy as could be. Simply go to the dialog to add a new data source and select “MongoDB”. Enter your credentials and the correct connection parameters, and leave the rest to Data Virtuality. For the sake of this tutorial, a simple MongoDB server was used with no additional credentials required. Usually, of course, you would need to provide some.

Query MongoDB Databases With Standard SQL Statements

Go to the dialogue in order to add a new data source.

Query MongoDB Databases With Standard SQL Statements

 

Select the MongoDB entry from the list of data sources.

Query MongoDB Databases With Standard SQL Statements

Enter all required information to establish the connection.

Step 2 – Wait and see

After you click ‘finish’, the Data Virtuality Server retrieves all the necessary information to build a virtual data structure in a relational style. The data schema is presented to you and can be used like any ordinary relational database. This process, however, might take some time because the properties that can occur in the document entries of the collection have to be found.

When you know more about the structure of the data sources, you can exploit this and tell the server how many documents have to be analyzed at most to get the schema. In addition, it is also possible to set the maximum depth which is to be searched through within the document entries.  When the server is done, you have virtual tables with a virtual schema that are ready to be used:

Query MongoDB Databases With Standard SQL Statements

Table representation of the MongoDB collection’s documents

Step 3 – Use

Lastly, you can immediately start with queries of all sorts to evaluate the data from your NoSQL data source with the usual SQL expressions that you are familiar with. For example, you can take a look at all the data in collection1:

Query MongoDB Databases With Standard SQL Statements

Or, you might only be interested in all the persons from the database that ever had a seminal paper published. In this example, we retrieve the inventor of the famous Turing Machine:

Query MongoDB Databases With Standard SQL Statements

Now, you can proceed to build SQL queries that combine the MongoDB collection with all your other relational databases, giving you even more powerful data analysis and enabling you to leverage business and research insights.

Get Started with Data Virtuality

Your organization, your deployment choice.

More interesting articles

Data Virtuality brings enterprise data virtualization capabilities to CData, delivering highly-performant access to live data at any scale.
Discover how integrating data warehouse automation with data virtualization can lead to better managed and optimized data workflows.
Discover how our ChatGPT powered SQL AI Assistant can help Data Virtuality users boost their performance when working with data.
While caching offers certain advantages, it's not a one-size-fits-all solution. To comprehensively meet business requirements, combining data virtualization with replication is key.
Explore the potential of Data Virtuality’s connector for Databricks, enhancing your data lakehouse experience with flexible integration.
Generative AI is an exciting new technology which is helping to democratise and accelerate data management tasks including data engineering.