The world of databases in general, and of Big Data analysis in particular, largely make use of the paradigm introduced by NoSQL. One of its major achievements is to allow for a more flexible approach in terms of data structure. A widely known representative of this approach is MongoDB (Mongo = humongous). NoSQL, which is an abbreviation for ‘Not only SQL’, offers the possibility to have data sets (collections) that consist of entries which might have a few properties in common. But each also has some extra properties that are important to it and are stored together. Here is a quick example of what different entries from the same collection could look like:
Unfortunately, this flexibility comes with two trade-offs when implementing and actually using MongoDB. First, there is the problem of incompatibility with ANSI-SQL and its derivatives. Clearly, if there aren’t any tables and not all entries from a collection have a specific property then you cannot use a typical SQL syntax:
But a more serious drawback is the hardship you have to undergo if you want to combine a MongoDB collection with a regular relational database and evaluate it. This is, strictly speaking, possible but it requires a lot of time because special processes are needed that allow for transformations of the given NoSQL database so that you have a common data model, which suits your needs.
Data Virtuality – The Saving Grace
All the manual processing is not necessary anymore if you use the Data Virtuality Server. Through the coherent structure of tables and the sophisticated transformation that Data Virtuality performs automatically, MongoDB can easily be accessed and queried with just regular SQL statements, and the content can be combined with other data sources, such as relational databases. Apart from connecting to the NoSQL source and waiting for the structure to be detected by Data Virtuality, no more steps are necessary.
Step 1 – Connecting to the MongoDB Data Source
Adding MongoDB is just as easy as could be. Simply go to the dialog to add a new data source and select “MongoDB”. Enter your credentials and the correct connection parameters, and leave the rest to Data Virtuality. For the sake of this tutorial, a simple MongoDB server was used with no additional credentials required. Usually, of course, you would need to provide some.
Go to the dialogue in order to add a new data source.
Select the MongoDB entry from the list of data sources.
Enter all required information to establish the connection.
Step 2 – Wait and see
After you click ‘finish’, the Data Virtuality Server retrieves all the necessary information to build a virtual data structure in a relational style. The data schema is presented to you and can be used like any ordinary relational database. This process, however, might take some time because the properties that can occur in the document entries of the collection have to be found.
When you know more about the structure of the data sources, you can exploit this and tell the server how many documents have to be analyzed at most to get the schema. In addition, it is also possible to set the maximum depth which is to be searched through within the document entries. When the server is done, you have virtual tables with a virtual schema that are ready to be used:
Table representation of the MongoDB collection’s documents
Step 3 – Use
Lastly, you can immediately start with queries of all sorts to evaluate the data from your NoSQL data source with the usual SQL expressions that you are familiar with. For example, you can take a look at all the data in collection1:
Or, you might only be interested in all the persons from the database that ever had a seminal paper published. In this example, we retrieve the inventor of the famous Turing Machine:
Now, you can proceed to build SQL queries that combine the MongoDB collection with all your other relational databases, giving you even more powerful data analysis and enabling you to leverage business and research insights.