JSON in PostgreSQL: The Ultimate Guide

Mar 4, 2021 · 22 min read ·

Share on:

PostgreSQL has quite a lot of features for storing and working with JSON data.

In this guide, you'll learn:

What JSON is and why you might want to use it
Creating a table to store JSON data
How to add, read, update, and delete JSON data
Tips for performance, validating and working with JSON

Let's get into the guide.

If you want to download my PostgreSQL Cheat Sheet, enter your email below and you'll receive it shortly.

What is JSON and Why Should I Use It?

JSON stands for JavaScript Object Notation, and it's a way to format and store data.

Data can be represented in a JSON format in PostgreSQL so it can be read and understood by other applications or parts of an application.

It's similar to HTML or XML - it represents your data in a certain format that is readable by people but designed to be readable by applications.

Why Use JSON In Your Database?

So why would you use JSON data in your database?

If you need a structure that's flexible.

A normalised database structure, one with tables and columns and relationships, works well for most cases. Recent improvements in development practices also mean that altering a table is not as major as it was in the past, so adjusting your database once it's in production is possible.

However, if your requirements mean that your data structure needs to be flexible, then a JSON field may be good for your database.

One example may be where a user can add custom attributes. If it was done using a normalised database, this may involve altering tables, or creating an Entity Attribute Value design, or some other method.

If a JSON field was used for this, it would be much easier to add and maintain these custom attributes.

The JSON data can also be stored in your database and processed by an ORM (Object Relational Mapper) or your application code, so your database may not need to do any extra work.

What Does JSON Data Look Like?

Here's a simple example of JSON data:

1{
2  "id": "1",
3  "username": "jsmith",
4  "location": "United States"
5}

It uses a combination of different brackets, colons, and quotes to represent your data.

Let's take a look at some more examples.

Name/Value Pair

JSON data is written as name/value pairs. A name/value pair is two values enclosed in quotes.

This is an example of a name/value pair:

1"username": "jsmith"

The name is "username" and the value is "jsmith". They are separated by a colon ":".

This means for the attribute of username, the value is jsmith. Names in JSON need to be enclosed in double quotes.

Objects

JSON data can be enclosed in curly brackets which indicate it's an object.

1{"username": "jsmith"}

This is the same data as the earlier example, but it's now an object. This means it can be treated as a single unit by other areas of the application.

How does this help? It's good for when there are multiple attributes:

1{
2  "username": "jsmith",
3  "location": "United States"
4}

Additional attributes, or name/value pairs, can be added by using a comma to separate them.

You'll also notice in this example the curly brackets are on their own lines and the data is indented. This is optional: it's just done to make it more readable.

Arrays

JSON also supports arrays, which is a collection of records within an object. Arrays in JSON are included in square brackets and have a name:

 1{
 2  "username": "jsmith",
 3  "location": "United States",
 4  "posts": [
 5    {
 6      "id":"1",
 7      "title":"Welcome"
 8    },
 9    {
10      "id":"4",
11      "title":"What started it all"
12    }
13  ]
14}

In this example, this object has an attribute called "posts". The value of posts is an array, which we can see by the opening square bracket "[".

Inside the square bracket, we have a set of curly brackets, indicating an object, and inside those we have an id of 1 and a title of Welcome. We have another set of curly brackets indicating another object.

These two objects are posts and they are contained in an array.

And that covers the basics of what JSON is.

If JSON is new to you, don't worry, it gets easier as you work with it more.

If you're experienced with JSON, you'll find the rest of the guide more useful as we go into the details of working with JSON in PostgreSQL.

How to Create and Populate a JSON Field in PostgreSQL

So you've learned a bit about JSON data and why you might want to use it.

How do we create a field in PostgreSQL?

Two JSON Data Types

There are two data types in PostgreSQL for JSON: JSON and JSONB.

JSON is the "regular" JSON data type and was added in PostgreSQL 9.2. JSONB stands for JSON Binary and was added in PostgreSQL 9.4.

What's the difference?

There are a few differences, but it's mainly to do with how the data is stored. JSONB data is stored in a binary format and is easier to process.

PostgreSQL recommends using the JSONB data type in most situations.

Here's a table describing the differences.

Feature	JSON	JSONB
Storage	Stored exactly as entered	Stored in a decomposed binary format
Supports full text indexing	No	Yes
Preserve white space	Yes	No, it is removed
Preserve order of keys	Yes	No
Keep duplicate keys	Yes	No (last value is kept)

Because of the way the data is stored, JSONB is slightly slower to input (due to the conversion step) but a lot faster to process.

In this guide, we'll use the JSONB data type. Most of the features will work the same for the JSON data type, but we'll focus on JSONB.

Creating a JSON Field

We create a new JSONB field to store our JSON data.

Here's an example using a product table.

1CREATE TABLE product (
2  id INT,
3  product_name CHARACTER VARYING(200),
4  attributes JSONB
5);

We have created a table called product. It has an id and a product name. There's also an attributes column, which has the data type of JSONB.

Because it's a JSONB data type, the values inserted into it must be valid JSON. Other databases require a check constraint to be added, but this is not required in PostgreSQL.

Adding a JSON column is as easy as that.

Adding Data to a JSON Field

Now we've got our field for storing JSON data, how do we add data to it?

We simply insert a record into our table as though it's a text value. The value needs to be valid JSON, otherwise, we'll get an error.

We can add our first product like this:

1INSERT INTO product (id, product_name, attributes)
2VALUES (1, 'Chair','{"color":"brown", "material":"wood", "height":"60cm"}');

We can run this statement and the record is inserted.

If we try to insert an invalid JSON field, we'll get an error. Here's the INSERT statement:

1INSERT INTO product (id, product_name, attributes)
2VALUES (100, 'Bad Chair', '"color":"brown:height}');

This is what we'll see:

1ERROR:  invalid input syntax for type json
2LINE 12: VALUES (100, 'Bad Chair', '"color":"brown:height}');
3^
4DETAIL:  Expected end of input, but found ":".
5CONTEXT:  JSON data, line 1: "color":...

Using the method above, we needed to enter the data in exactly the right format.

Inserting Arrays

If you want to insert JSON data that contains arrays, you can enter it using text in a JSON format.

Here's how to insert an array by just specifying it in a JSON format.

1INSERT INTO product (id, product_name, attributes)
2VALUES (
3  2, 
4  'Desk',
5  '{"color":"black", "material":"metal", "drawers":[{"side":"left", "height":"30cm"}, {"side":"left", "height":"40cm"}]}'
6);

This will insert a new product that has an array of drawers. As you can probably see by this statement, reading it (and writing it) is a bit tricky.

You can insert simpler arrays using this method too.

1INSERT INTO product (id, product_name, attributes)
2VALUES (
3  3, 
4  'Side Table',
5  '{"color":"brown", "material":["metal", "wood"]}'
6);

The INSERT statements will work.

Using JSON Functions to Insert Data

We just saw how to insert data into a JSONB field by providing text values.

They can be error-prone and hard to type.

Fortunately, PostgreSQL offers a few functions to help us insert data into a JSON field.

The JSONB_BUILD_OBJECT function will take a series of values and create a JSON object in binary format, ready to be inserted into a JSONB field.

We can use the JSONB_BUILD_OBJECT to construct a new record:

1JSONB_BUILD_OBJECT('color', 'black', 'material', 'plastic')

This will create a value that looks like this:

1{"color": "black", "material": "plastic"}

We can use this in the INSERT statement. We don't need to worry about putting brackets, commas, colons, and quotes in the right place. As long as the string values are quoted, it should work.

1INSERT INTO product (id, product_name, attributes)
2VALUES (4, 'Small Table', JSONB_BUILD_OBJECT('color', 'black', 'material', 'plastic'));

The new record will be added to the table.

There are several other functions available to help you insert JSON values into a table:

to_json and to_jsonb
array_to_json
row_to_json
json_build_array and jsonb_build_array
json_object and jsonb_object

How to Read and Filter JSON Data in PostgreSQL

Once you've got some JSON data in a table, the next step is to read it.

How do we do that?

Selecting a JSON field is pretty simple. We can just select it as though it is a column.

1SELECT
2id,
3product_name,
4attributes
5FROM product;

id	product_name	attributes
1	Chair	{"color": "brown", "height": "60cm", "material": "wood"}
2	Desk	{"color": "black", "drawers": [{"side": "left", "height": "30cm"}, {"side": "left", "height": "40cm"}], "material": "metal"}
3	Side Table	{"color": "brown", "material": ["metal", "wood"]}
4	Small Table	{"color": "black", "material": "plastic"}

This shows us the data in the JSON column, and it looks just like a text value.

The good thing with this is that any application can easily read this field and work with it how they want (display it, filter it, and so on).

What if we wanted to do more in our database?

Selecting Individual Attributes

The JSON data is stored in something that looks like a text field. However, it's quite easy to get attributes and values out of this text field and display them.

We can extract a value from the JSON field and display it in a separate column. We can do this using a few different symbols or notations.

To view a particular key's value, we can specify the key as a "child" of the JSONB column.

We do this using the -> notation:

1json_column -> key

For example:

1SELECT
2id,
3product_name,
4attributes -> 'color' AS color_key
5FROM product;

We are selecting the color element from the attributes column and giving it an alias of color_key.

Here are the results:

id	product_name	color_key
1	Chair	"brown"
2	Desk	"black"
3	Side Table	"brown"
4	Small Table	"black"

Notice that each value has a quote around it.

Alternatively, we can select only the values, which displays the value without quotes. The symbol for this is ->>.

1json_column ->> key

Here's our updated query. We'll select both methods so you can see the difference

1SELECT
2id,
3product_name,
4attributes -> 'color' AS color_key,
5attributes ->> 'color' AS color_value
6FROM product;

Here are the results.

id	product_name	color_key	color_value
1	Chair	"brown"	brown
2	Desk	"black"	black
3	Side Table	"brown"	brown
4	Small Table	"black"	black

What if the key does not exist?

Not every record has a height key. Let's see what happens if we select this.

1SELECT
2id,
3product_name,
4attributes ->> 'height' AS height
5FROM product;

id	product_name	height
1	Chair	60cm
2	Desk	null
3	Side Table	null
4	Small Table	null

Null values are shown where the key does not exist.

Selecting Array Values

We can use a similar concept to select a value that is stored as an array. In our example, the "drawers" attribute is an array.

Our query would look like this:

1SELECT
2id,
3product_name,
4attributes ->> 'drawers' AS drawers
5FROM product;

The results are:

id	product_name	drawers
1	Chair	null
2	Desk	[{"side": "left", "height": "30cm"}, {"side": "left", "height": "40cm"}]
3	Side Table	null
4	Small Table	null

The full value of the drawers attribute is shown. It's an array with two objects inside, and each object has a side key and a height key.

What if we want to see an attribute that's inside another attribute? For example, the first of the "drawer" attributes?

We can do this by getting the element of drawers, then selecting the element at a specified number.

1attributes -> 'drawers' -> 1

This example will get the attributes column, then the drawers element, then the element at position 1 (which is the 2nd element as arrays start at 0).

Notice how the -> symbol is used, as this symbol retrieves the element and not just the text.

Here's a sample query.

1SELECT
2id,
3product_name,
4attributes -> 'drawers' -> 1 AS drawer_value
5FROM product;

The results are:

id	product_name	drawers
1	Chair	null
2	Desk	{"side": "left", "height": "40cm"}
3	Side Table	null
4	Small Table	null

We can use a different notation to get an array as either an object or a text. The notation #> will retrieve it as an element and #>> will retrieve it as text.

Let's see an example.

1SELECT
2id,
3product_name,
4attributes #> '{drawers, 1}' AS drawers_element,
5attributes #>> '{drawers, 1}' AS drawers_text
6FROM product;

After the notation, we specify curly brackets, and inside we specify the key name and the element number. In this example, we are looking for the drawers key and element 1, which is the second element.

Here are our results:

id	product_name	drawers_element	drawers_text
1	Chair	null	null
2	Desk	{"side": "left", "height": "40cm"}	{"side": "left", "height": "40cm"}
3	Side Table	null	null
4	Small Table	null	null

In this example, both columns are the same. But we can see how this notation is used and it may be useful for us in our JSON queries.

Filtering on JSON Data in PostgreSQL

Let's say we wanted to see our Chair product, which has a brown color, wood material, and a height of 60cm. But we want to filter on the JSON attributes for this example.

Let's try this query.

1SELECT
2id,
3product_name,
4attributes
5FROM product
6WHERE attributes = '{"color":"brown", "material":"wood", "height":"60cm"}';

We can run this query. Here's what we see:

id	product_name	attributes
1	Chair	{"color":"brown", "material":"wood", "height":"60cm"}

This works because the text value is stored in the attributes column, and as long as we provide the exact full string, it will work.

Often we need to filter on one of the keys or values. How can we do that?

What if we try using the LIKE keyword with a partial match?

1SELECT
2id,
3product_name,
4attributes
5FROM product
6WHERE attributes LIKE '%"color":"brown"%';

1ERROR:  operator does not exist: jsonb ~~ unknown
2LINE 27: WHERE p.attributes LIKE '%"color":"brown"%';
3^
4HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.

We get an error in PostgreSQL as we can't use the LIKE attribute on a JSONB column.

There are several features in PostgreSQL that make it possible to filter on JSON data.

Using Notation to Filter Data

Let's say we want to find all products where the color is brown. The color is a part of the attributes JSON column in our table.

We can use the same notation we used for selecting a column for filtering it.

We can write the notation like this:

1attributes -> 'color'

We then add this to our SELECT statement

1SELECT
2id,
3product_name,
4attributes
5FROM product
6WHERE attributes -> 'color' = 'brown';

We've added this notation to our WHERE clause, and added a condition where the color is equal to "brown".

Here's what happens when we run the query.

1ERROR:  invalid input syntax for type json
2LINE 27: WHERE attributes -> 'color' = 'brown';
3^
4DETAIL:  Token "brown" is invalid.
5CONTEXT:  JSON data, line 1: brown

We get this error because the query is expecting a JSON object in the WHERE clause, and we provided the string of "brown". It's expecting a JSON object because we used ->.

We can use ->> to be able to provide a string.

1SELECT
2id,
3product_name,
4attributes
5FROM product
6WHERE attributes ->> 'color' = 'brown';

The results of this query are shown below.

id	product_name	attributes
1	Chair	{"color":"brown", "height":"60cm", "material":"wood"}
3	Side Table	{"color": "brown", "material": ["metal", "wood"]}

We can see that the results only show records where the color attribute is brown.

Splitting JSON Data into Rows

So far we've seen examples of using different functions to read and filter JSON data.

The result has been that all of the JSON data has been displayed in a single column.

PostgreSQL includes some functions that let you expand the JSON fields and display them in different rows.

The JSONB_EACH function will display each key and value pair as elements. It will split each key into separate rows.

1SELECT
2id,
3product_name,
4JSONB_EACH(attributes)
5FROM product;

Here are the results:

id	product_name	jsonb_each
1	Chair	(color,"""brown""")
1	Chair	(height,"""60cm""")
1	Chair	(material,"""wood""")
2	Desk	(color,"""black""")
2	Desk	(drawers,"[{""side"": ""left"", ""height"": ""30cm""}, {""side"": ""left"", ""height"": ""40cm""}]")
2	Desk	(material,"""metal""")
3	Side Table	(color,"""brown""")
3	Side Table	(material,"[""metal"", ""wood""]")
4	Small Table	(color,"""black""")
4	Small Table	(material,"""plastic""")

I can't think of a reason that this function would be useful. Perhaps if you want to filter on attributes and display all of the attribute values as well.

You can also use JSON_OBJECT_KEYS to get a list of all keys in the JSON field:

1SELECT
2id,
3product_name,
4JSONB_OBJECT_KEYS(attributes)
5FROM product;

The results are:

id	product_name	jsonb_object_keys
1	Chair	color
1	Chair	height
1	Chair	material
2	Desk	color
2	Desk	drawers
2	Desk	material
3	Side Table	color
3	Side Table	material
4	Small Table	color
4	Small Table	material

This can be useful to find all rows that have a specific key, or to find the most common keys:

1SELECT
2JSONB_OBJECT_KEYS(attributes) AS attr_key,
3COUNT(*)
4FROM product
5GROUP BY JSONB_OBJECT_KEYS(attributes)
6ORDER BY COUNT(*) DESC;

attr_key	count
color	4
material	4
drawers	1
height	1

Check If a Key Exists

PostgreSQL has other operators that let you find records that contain a certain JSON attribute.

One example is the ? operator. This will let you determine if a JSON value contains a specified key.

Let's see an example. This query finds records that have the key of "drawers".

1SELECT
2id,
3product_name,
4attributes
5FROM product
6WHERE attributes ? 'drawers' = true;

The results are:

id	product_name	attributes
2	Desk	{"color": "black", "drawers": [{"side": "left", "height": "30cm"}, {"side": "left", "height": "40cm"}], "material": "metal"}

The row with id 2 is shown as it's the only one with the attribute of drawers.

How to Update JSON Data in PostgreSQL

Reading JSON is one thing. What if you need to update JSON data?

There are several ways to do this. We'll look at each of them.

Insert a New Key by Concatenating Values

You can update a JSON field using an UPDATE statement. Using this UPDATE statement, you can add a new key and value to the field by concatenating it to the existing value.

Here's the syntax:

1UPDATE table
2SET json_field = json_field || new_json_data;

If we want to add a new key and value pair to one of our products, we can concatenate a JSON value to the existing value and run it in the UPDATE statement.

Here's our table before the update:

id	product_name	attributes
1	Chair	{"color": "brown", "height": "60cm", "material": "wood"}

Here's the update statement:

1UPDATE product
2SET attributes = attributes || '{"width":"100cm"}'
3WHERE id = 1;

Here's the table after the update:

id	product_name	attributes
1	Chair	{"color": "brown", "width": "100cm", "height": "60cm", "material": "wood"}

Notice that the new width key is added to the JSON value. It's also added in the middle, as the JSONB data type doesn't preserve the order of the keys. This is OK, our JSON field still works as expected.

Updating an Existing Value Using JSONB_SET

The JSONB_SET function allows you to update an existing key to a new value. This is helpful if you don't want to read and update the entire field just to change one value.

The syntax looks like this:

1JSONB_SET(json_column, key, new_value)

Let's say we want to update the height for our Chair product from 60cm to 75cm.

Here's the table before the update:

id	product_name	attributes
1	Chair	{"color": "brown", "height": "60cm", "material": "wood"}

Our Update statement would look like this:

1UPDATE product
2SET attributes = JSONB_SET(attributes, '{height}', '"75cm"')
3WHERE id = 1;

There are a few things to notice:

The first parameter of JSONB_SET is the attributes column.
The second parameter is the height key. This needs to be enclosed in curly brackets to be treated as a JSON key.
The third parameter is the new value of 75cm. This is enclosed in single quotes as it's a string, and then double quotes as it's a JSON value. Without the double quotes, you'll get an error.

Here's the table after the update:

id	product_name	attributes
1	Chair	{"color": "brown", "height": "75cm", "material": "wood"}

No new attributes were added. The Update statement simply updated the existing value.

How to Delete from a JSON Field in PostgreSQL

There are two DELETE operations you can do when working with JSON fields:

delete an attribute from a JSON field
delete a row from your table

Deleting a Row using JSON_VALUE

Deleting a row from your table is done in the same way as regular SQL. You can write an SQL statement to delete the row that matches your ID, or using the notation.

For example, to delete all rows where the color attribute is brown:

1DELETE FROM product
2WHERE attributes ->> 'color' = 'brown';

This will remove the matching records from the table.

Removing an Attribute from a JSON Field

The other way to delete JSON data is to remove an attribute from a JSON field.

This is different from updating, as you're removing the attribute entirely rather than just updating its value to something else.

We can remove an attribute from a JSON field using the - operator. We use the UPDATE statement and update the JSON field with the - operator and the key we want to remove.

For example, here's our Chair product.

id	product_name	attributes
1	Chair	{"color": "brown", "height": "60cm", "material": "wood"}

Let's say we want to remove the "height" attribute. We can do this by writing an update statement and removing it.

1UPDATE product
2SET attributes = attributes - 'height'
3WHERE id = 1;

After we run this statement, we can check our table again.

id	product_name	attributes
1	Chair	{"color": "brown", "material": "wood"}

The height attribute has been removed.

Improve the Performance of JSON Queries

The JSON support and features in PostgreSQL are pretty good, and each version includes more features.

So, given that you can add JSON columns to tables, extract fields, and get all the flexibility of JSON fields with validation, wouldn't it be better to just store all of your data in JSON fields rather than normalised tables?

Well, sometimes that might be a good idea. But then you may be better off using a NoSQL database rather than PostgreSQL.

Another reason why using primarily JSON fields to store your data is not a good idea is that it can struggle with performance.

Select Performance

For example, let's say we want to select all products where the color is brown. We can use the ->> operator in the WHERE clause that we saw earlier in this guide:

1SELECT
2id,
3product_name,
4attributes
5FROM product
6WHERE attributes ->> 'color' = 'brown';

Here is the query plan for this.

QUERY PLAN
Seq Scan on product (cost=0.00..12.40 rows=1 width=454)
Filter: ((attributes ->> 'color'::text) = 'brown'::text)

The execution plan shows a Seq Scan, which is short for Sequential Scan and is a slow type of access. This might be OK for our table, which only has a few records, but once you start working with larger tables it can be quite slow.

What can we do?

Full-Text Index with GIN Index

PostgreSQL allows you to create a full-text index on the JSON field. This should improve the performance of any queries on this field.

It uses an index type called GIN, which stands for Generalised Inverted Index. It's used in several situations, and one of which is for JSONB values.

Let's see how we can create one.

We create an index based on the JSON_VALUE in the WHERE clause we want to run.

1CREATE INDEX idx_prod_json ON product USING GIN(attributes);

The name of the index is idx_prod_json. The product table is used, and the attributes column is specified. We add USING GIN to use the GIN feature or index type.

Now let's run the Select query again and see the explain plan.

QUERY PLAN
Seq Scan on product (cost=0.00..1.06 rows=1 width=454)
Filter: ((attributes ->> 'color'::text) = 'brown'::text)

We can see it still uses a Seq Scan, which may be because there are only four rows in the table.

However, we can see the cost is a lot less:

Before Index	After Index
cost=0.00..12.40	cost=0.00..1.06

This may result in a faster query. The difference may be more evident on larger tables.

Tips for Working with JSON in PostgreSQL

In this guide, we've looked at what JSON is, seen how to create JSON fields in PostgreSQL, and seen a range of ways we can work with them.

So, what's the best way to work with JSON fields in PostgreSQL?

Here are some tips I can offer for using JSON in PostgreSQL. They may not apply to your application or database but they are things to consider.

Just because you can, doesn't mean you should

JSON is flexible and quite powerful, but just because you can store data in a JSON field, doesn't mean you should. Consider using the advantages of the PostgreSQL relational database and using JSON where appropriate.

Treat the JSON field like a black box

The JSON field can be used to store valid JSON data sent or received by your application. While there are functions for reading from and working with the JSON field, it might be better to just store the JSON data in the field, retrieve it from your application, and process it there.

This is the concept of a black box. The application puts data in and reads data from it, and the database doesn't care about what's inside the field.

It may or may not work for your situation, but consider taking this approach.

Search by the Primary Key and other fields

We've seen that it can be slow to search by attributes inside the JSON field. Consider filtering by the primary key and other fields in the table, rather than attributes inside the JSON field. This will help with performance.

Conclusion

I hope you found this guide useful. Have you used JSON fields in PostgreSQL? What has your experience been like? Do you have any questions? Feel free to use the comments section on the post.