The world around databases and, especially, Big Data analysis largely make use of the paradigm introduced with NoSQL. One of its major achievements is to allow for a more flexible approach in terms of data structure. A widely known representative of this approach is MongoDB (Mongo = humongous). NoSQL, which is an abbreviation for ‘Not only SQL’ offers the possibility to have data sets (collections) that consist of entries which might share only a minute set of properties but each one has some extra properties that are important to it and are stored together. Here is a quick example of how different entries from the same collection could look like:
Unfortunately, this flexibility comes with two trade-offs when implementing and actually using MongoDB. At first, there is the problem of incompatibility with ANSI-SQL and its derivatives. Clearly, if there is no tabled structure and not all entries from a collection have a specific property then you cannot use typical SQL syntax:
But a more serious drawback is the hardship you have to undergo if you want to combine a MongoDB collection with a regular relational Database and evaluate it. This is, strictly speaking, possible but requires a lot of time because special processes are needed that allow for transformation of the given NoSQL database so that you have a common data model which suits your needs.
datavirtuality the saving grace
All the manual processing is not necessary anymore when datavirtuality Server is in use. Through the coherent structure of tables and the elaborate transformation that datavirtuality performs automatically, MongoDB can easily be accessed and queried just with regular SQL statements and the content can be combined with other data sources like relational databases. Apart from connecting to the NoSQL source and waiting for the structure to being detected by datavirtuality, no more steps are necessary.
Step 1 – Connecting to the MongoDB data source
Adding the data source is just as easy as could be. Simply go to the dialogue to add a new data source and select the entry MongoDB. Enter you credentials and the correct connection parameters, leave the rest to datavirtuality. For the sake of this tutorial, a simple MongoDB server was used with no additional credentials required. Usually, of course, you might need to provide some.
Go to the dialogue in order to add a new data source.
Select the MongoDB entry from the list of data sources.
Enter all required information to establish the connection.
Step 2 – Wait and see
After you have clicked on ‘finish’, the datavirtuality Server will retrieve all the necessary information in order to build a virtual data structure in relational style. The data schema is presented to you and can be used like any ordinary relational database. This process, however, might take some time because the properties that can occur in the document entries of the collection have to be found. When you know more about the data sources structure, you can exploit this and tell the server how many documents have to be analyzed at most to get the schema. In addition, it is also possible to set the maximum depth which is to be searched through within the document entries. When the server is done, you have virtual tables with a virtual schema to use:
Table representation of the MongoDB collection's documents
Step 3 – Use
Lastly, you can immediately start with queries of all sorts to evaluate the data from you NoSQL data source with the usual SQL expressions that anyone know and is familiar with. For example, you can just have a look at all the data in the collection.
Or you might only be interested in all the persons from the database that ever had a seminal paper published throughout the history. In this sample case, we retrieve the inventor of the famous Turing Machine.
Now, you can proceed and build SQL queries that combine the MongoDB collection with all you other relational databases allowing you even more powerful data analysis with and, hence, reports giving leveraging business or research insights.