CData Acquires Data Virtuality, Modernizing Enterprise Data Virtualization

Article contents

Share Article

How to use the JSON Datatype in PostgreSQL: A Brief Manual

Article contents

 

Introduction

Want to know how to use PostgreSQL’ s great functionality to work with JSON objects? Read on! Originally introduced in version 9.2, this feature was greatly enhanced in version 9.3 and we will look at the operators that come with the JSON datatype. We assume that you have an installed version of PostgreSQL 9.3 or higher if you want to work through the examples shown in this post.

Novelties

The main aspect of the JSON support is the new datatype and its supplementary functions and operators. In this post, we focus on the operators since they form a strong pillar for typical work with JSON objects. Here is a concise list of the most important new features:

  • new datatype: json
  • new operators: ->, ->>, #> and #>>
  • new functions: json_array_length, json_extract_path, json_array_elements and many more

 

The full list of JSON specific functions can be found at:

https://www.postgresql.org/docs/9.3/static/functions-json.html

General Usage of the Operators

Let’s get a quick overview concerning the main usage and differences among the three operators. After that we will show you their usage in several examples.

The operators -> and ->> are easy to use and allow you to point to a specific child element within a JSON object. Both operators can be used either by using the name of the element that is of interest, or by using an index when the current object is an array of elements. It is also possible to provide a dynamic path within a JSON object that we want to specify. For this purpose, the operators #> and #>> have been added and we can say, in terms of the resulting datatype, #> equals -> and #>> equals ->>.

Before we get to the examples, here is a list of the major aspects of the operators:

Operator ->

  • Allows you to select an element based on its name.
  • Allows you to select an element within an array based on its index.
  • Can be used sequentially: ::json->’elementL’->’subelementM’->…->’subsubsubelementN’.
  • Return type is json and the result cannot be used with functions and operators that require a string-based datatype. But the result can be used with operators and functions that require a json datatype.

Operator ->>

  • Allows you to select an element based on its name.
  • Allows you to select an element within an array based on its index.
  • Cannot be used sequentially.
  • Return type is text and the result can be used with functions and operators that require a string-based datatype. For the same reason, sequential usage of the operator is not supported.

Operator #>

  • Allows you to select an element based on its path within the main JSON object. The path can consist of element names and array indexes, depending on what is needed.
  • Can be used sequentially: ::json#>'{elementname1,elementname2,index1,index2}’#>'{elementname3}’.
  • Return type is json and the result cannot be used with functions and operators that require a string-based datatype. But the result can be used with operators and functions that require a json datatype.

Operator #>>

  • Allows you to select an element based on its path within the main JSON object. The path can consist of element names and array indexes, depending on what is needed.
  • Cannot be used sequentially.
  • Return type is text and the result can be used with functions and operators that require a string-based datatype. For the same reason, sequential usage of the operator is not supported.
  • Basically, you can create a ‘chain’ of -> and #> operators, provided that you point to valid elements and indexes. Such a chain can end with each of the four operators. The last one determines whether the result can be used as input for other specific functions.
  • Note that if you want to use the result in combination with a function or operator that requires a text datatype, you have to use ->> or #>> as the last one in the chain. This might be important if you have a WHERE clause or a sub statement that refers to the result of the json operators. Let us proceed with some examples to see these operators in action.

Sometimes I see ::json and sometimes not, what is the difference?

This is an easy one. The operators ->, ->>, #> and #>> require the input datatype json. If PostgreSQL can implicitly convert the parameter to json then you can omit ::json. If you cannot guarantee that the casting is always performed in your statement, you pass ::json and the operators can be used.  When in doubt, use the ::json prefix before the first json operator. The following operators, if you are chaining them, neither require nor support this syntax.

 

Get Started with Data Virtuality

Your organization, your deployment choice.

Query Examples

Insert sample data into a new database

First of all, we need some data which we can use for the exploring purpose. We will use weather data provided by the API at OpenWeatherMap. It may not be the best data in terms of a sophisticated use case but it will be more than enough to get to grips with the operators because the data contains some basic complexity.

Our interest lies in the data four five of the better known English cities London (THE London), Sheffield (Snooker world championship), Liverpool (the Beatles and Mersey Beat Generation), Manchester (this is for the football fans) and Leicester (the delicious Red Leicester cheese). Below is the link to get an API key weather account, which is necessary to get access to weather API:

https://home.openweathermap.org/users/sign_up

If you want to use the same data as in this article, you can execute the following sql statements. Please create a database for testing purposes first. The post assumes that this has already been done. An alternative way is presented after the statements in one statement that creates the new table and fills it with data in one step.

CREATE TABLE public.json_table (myjson json);
INSERT INTO public.json_table VALUES('{"coord":{"lon":-0.13,"lat":51.51},"sys": {"type":3,"id":98614,"message":0.0218,"country":"GB","sunrise":1427693969,"sunset":1427740219},"weather": [{"id":803,"main":"Clouds","description":"broken clouds","icon":"04d"}],"main": {"temp":6.99,"humidity":57,"pressure":1014.9,"temp_min":5.56,"temp_max":10.3},"wind": {"speed":0.7,"gust":1.7,"deg":315},"rain":{"3h":0},"clouds": {"all":64},"dt":1427699997,"id":2643743,"name":"London"}');
INSERT INTO public.json_table VALUES('{"coord":{"lon":-1.47,"lat":53.38},"sys":{"type":3,"id":38764,"message":0.0121,"country":"GB","sunrise":1427694192,"sunset":1427740640},"weather":[{"id":802,"main":"Clouds","description":"scattered clouds","icon":"03d"}],"main":{"temp":4.64,"humidity":75,"pressure":988,"temp_min":4.44,"temp_max":5},"wind":{"speed":3.08,"gust":7.71,"deg":226},"rain":{"3h":0.03},"clouds":{"all":36},"dt":1427699954,"id":2638077,"name":"Sheffield"}');
INSERT INTO public.json_table VALUES('{"coord":{"lon":-2.98,"lat":53.41},"sys":{"type":3,"id":10567,"message":0.0519,"country":"GB","sunrise":1427694552,"sunset":1427741005},"weather":[{"id":800,"main":"Clear","description":"Sky is Clear","icon":"01d"}],"main":{"temp":8,"humidity":67,"pressure":1000,"temp_min":3.33,"temp_max":17.78},"wind":{"speed":4.63,"gust":7.2,"deg":282},"rain":{"3h":0},"clouds":{"all":0},"dt":1427700074,"id":2644210,"name":"Liverpool"}');
INSERT INTO public.json_table VALUES('{"coord":{"lon":-2.24,"lat":53.48},"sys":{"type":3,"id":28022,"message":0.0377,"country":"GB","sunrise":1427694371,"sunset":1427740831},"weather":[{"id":801,"main":"Clouds","description":"few clouds","icon":"02d"}],"main":{"temp":4.66,"pressure":1004,"temp_min":3.33,"temp_max":6.11,"humidity":89},"wind":{"speed":3.6,"gust":5.65,"deg":268},"rain":{"3h":0.0099999999999998},"clouds":{"all":24},"dt":1427700074,"id":2643123,"name":"Manchester"}');
INSERT INTO public.json_table VALUES('{"coord":{"lon":-1.13,"lat":52.64},"sys":{"type":3,"id":184635,"message":0.9267,"country":"GB","sunrise":1427694152,"sunset":1427740519},"weather":[{"id":802,"main":"Clouds","description":"scattered clouds","icon":"03d"}],"main":{"temp":4.92,"humidity":75,"pressure":1013,"temp_min":4.4,"temp_max":5.4},"wind":{"speed":2.9,"gust":7.5,"deg":337},"rain":{"3h":0.024999999999999},"clouds":{"all":36},"dt":1427700055,"id":2644668,"name":"Leicester"}');

Insert current data with SELECT INTO

Get your personal data from this URL after your created an API key weather account and copy the resulting json into an SQL Editor from pgAdmin. You can insert the full json and make use of the json_array_elements function.

SELECT value as myjson INTO public.json_table

FROM json_array_elements('{"cnt":5,"list":[…]}'::json->'list');

The statement takes the JSON object and extracts the list element from it. As it contains an array with each index holding information regarding one of the selected cities, we can use the predefined function json_array_elements that returns one row for each of the array elements. Due to SELECT INTO, all returned rows are inserted into the new table public.json_table.

Examples in the SELECT Clause Now the time has come to finally look at some examples. We hop right into it.

Example SELECT

SELECT myjson::json->'name' as CityName

FROM   public.json_table;

The query takes the object and searches for the element name. This element’s value is returned for each row in the table. Note that in the example the casting of the column to datatype json is performed.

Example SELECT with returned ‘sub’ json

SELECT myjson::json->'name' as CityName, myjson::json->'coord' as Coordinates

FROM   public.json_table;

We are getting slightly more complex by selecting two elements, both referenced via -> operator and their name, from the table. Note that the second column from the output is itself a JSON object which we will be targeting in the next example.

Example SELECT with the ‘sub’ json strings

SELECT myjson::json->'name' as CityName, myjson::json->'coord'->'lon' as Longitude,myjson::json->'coord'->'lat' as Latitude

FROM public.json_table;

The example above shows the ‘chaining’ of the -> operator. The first usage has the ::json prefix to ensure that the text is being handled as json type and the consecutive use of -> allows to reference an element that is within a JSON object which itself exists within a super object. It should be noted that the second use of -> has no preceding ::json.

Example SELECT which returns array

At next, we will reference a specific array element within an object. Looking at an extract from the entry for London, we have {…"weather":[{"id":803,"main":"Clouds","description":"broken clouds","icon":"04d"}],…}. Clearly, the element weather contains an array, even if only of length 1. If we want to get further into this array, we need to specify that we are interested in the array element at index 0 (indexes start with 0, not 1).

SELECT myjson::json->'name' as CityName, myjson::json->'weather'->0 as DetailedWeather

FROM   public.json_table;

Clearly, we only care about the core weather information and decide to select the elements main and description from the JSON object that is held in array element at index 0.

Example which addresses elements within this array

SELECT myjson::json->'name' as CityName, myjson::json->'weather'->0->'main' as WeatherShort, myjson::json->'weather'->0->'description' as WeatherLong

FROM public.json_table;

Above example also shows how the combination of element names or indexes can be used to navigate to every part of a JSON object. Speaking of, or rather writing about, paths, we can also use the #> operator to achieve the same goal. Only that the syntax is slightly different.

Example SELECT with specific element referenced by the #> operator

SELECT myjson::json#>'{name}' as CityName,myjson::json#>'{weather,0,main}' as WeatherShort, myjson::json#>'{weather,0,description}' as WeatherLong

FROM public.json_table;

No difference is in the result but some might prefer this way to point to distinct elements. In each path description (delimited by { and }), integer values are considered as index numbers and strings as element names; like -> and ->>.

In all these example the last occurrence of -> could have been swapped easily with ->> and no error would occur. To fully understand their difference, we will try to filter data in the next two examples.

WHERE clauses with JSON

At best we repeat a part of the statement from above regarding the -> operator: “Return type is json and the result cannot be used with functions and operators that require a string-based data type”. So we might be interested in all cities that have cloudy weather. We might try something like the following:

Wrong

SELECT myjson::json->'name' as CityName

FROM public.json_table

WHERE myjson::json->'weather'->0->'main' = 'Clouds';

An error is thrown by PostgreSQL and reads: Error message:

ERROR: Operator does not: json = unknown LINE 3: WHERE myjson::json->'weather'->0->'main' = 'Clouds';

This tells us that the result of myjson::json->’weather’->0->’main’ is of type json but the = operator does not support json. If we replace the last -> by ->> then the result is not an JSON object but a simple text that supports =. Now it should be clear when to use an operator that returns json and when to use one that returns text. The same applies to #> and #>>, if we want to re-use the returned result.

Correct

SELECT myjson::json->'name' as CityName

FROM public.json_table

WHERE myjson::json->'weather'->0->>'main' = 'Clouds';

Use ->> instead of -> when the returned values shall be re-used with ordinary operators and functions!

Another, slightly more complex WHERE

SELECT myjson::json->'name' as CityName, myjson::json->'main'->>'temp' as Temperature

FROM public.json_table

WHERE myjson::json->'weather'->0->>'main' = 'Clouds' AND CAST(myjson::json->'main'->>'temp' as float) > 5;

Our second example with WHERE clauses cares about all cities that have cloudy weather but more than 5°Centigrade. As the API link above contained the part ‘&units=metric’, the temperatures are provided in °C. Here is the result from the query:

We will have a final example showing the use of our json operators within a GROUP BY clause. GROUP BY with JSON operators

GROUP BY needs the ->> or the #>> operator in the last place of the chain order to being capable of grouping. The following example groups by type of weather and returns the number of cities, respectively.

SELECT myjson::json->'weather'->0->>'main' as WeatherType, COUNT(*) as AffectedCities

FROM public.json_table

GROUP BY myjson::json->'weather'->0->>'main'

The counterpart with #>> is

SELECT myjson::json#>>'{weather,0,main}' as WeatherType, COUNT(*) as AffectedCities

FROM public.json_table

GROUP BY myjson::json#>>'{weather,0,main}'

Unfortunately, all good things come to an end, and so does this blog post. We hope that you’re now well-equipped to use the basic operators to navigate through JSON objects in order to select or reference any particular element. PostgreSQL also provides a good deal of functions to increase the capabilities of working with and building JSON type objects even further. However, the most common usage scenarios when data is already available were covered in this article and the examples have shown the differences and usages of the operators. We encourage all our readers to try to make the most of the functionalities that come with PostgreSQL.

If you’re interested in learning more check out our blogpost on querying MongoDB Databases with standard SQL statements using Data Virtuality.

Interested in seeing how you can get started?

Book a demo and learn how you can enable Data Mesh with the Data Virtuality Platform:

 
 

More interesting articles

Data Virtuality brings enterprise data virtualization capabilities to CData, delivering highly-performant access to live data at any scale.
Discover how integrating data warehouse automation with data virtualization can lead to better managed and optimized data workflows.
Discover how our ChatGPT powered SQL AI Assistant can help Data Virtuality users boost their performance when working with data.
While caching offers certain advantages, it's not a one-size-fits-all solution. To comprehensively meet business requirements, combining data virtualization with replication is key.
Explore the potential of Data Virtuality’s connector for Databricks, enhancing your data lakehouse experience with flexible integration.
Generative AI is an exciting new technology which is helping to democratise and accelerate data management tasks including data engineering.