Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres: Are There Downsides to Using a JSON Column vs. an integer[] Column?


TLDR: If I want to save arrays of integers in a Postgres table, are there any pros or cons to using an array column (integer[]) vs. using a JSON column (eg. does one perform better than the other)?

Backstory:

I'm using a PostgreSQL database, and Node/Knex to manage it. Knex doesn't have any way of directly defining a PostgreSQL integer[] column type, so someone filed a Knex bug asking for it ... but one of the Knex devs closed the ticket, essentially saying that there was no need to support PostgreSQL array column types when anyone can instead use the JSON column type.

My question is, what downsides (if any) are there to using a JSON column type to hold a simple array of integers? Are there any benefits, such as improved performance, to using a true array column, or am I equally well off by just storing my arrays inside a JSON column?

EDIT: Just to be clear, all I'm looking for in an answer is either of the following:

A) an explanation of how JSON columns and integer[] columns in PostgreSQL work, including either how one is better than the other or how the two are (at least roughly) equal.

B) no explanation, but at least a reference to some benchmarks that show that one column type or the other performs better (or that the two are equal)

like image 873
machineghost Avatar asked Mar 31 '18 21:03

machineghost


People also ask

Should I use JSON in Postgres?

Don't use JSON for data that can easily be stored in database tables. Avoid large JSON objects if you want to modify individual attributes. Don't use JSON if you want to use attributes in complicated WHERE conditions. Avoid JSON if you want to join on some of the attributes.

Should I use JSON or Jsonb?

In general, most applications should prefer to store JSON data as jsonb , unless there are quite specialized needs, such as legacy assumptions about ordering of object keys. RFC 7159 specifies that JSON strings should be encoded in UTF8.

What is [] in PostgreSQL?

Array Type. PostgreSQL gives the opportunity to define a column of a table as a variable length single or multidimensional array. Arrays of any built-in or user-defined base type, enum type, or composite type can be created. We will focus on one data type in particular, the Array of text, text[].

Should you use arrays in Postgres?

When you are considering portability (e.g. rewriting your system to work with other databses) then you must not use arrays. If you are sure you'll stick with Postgres, then you can safely use arrays where you find appropriate. They exist for a reason and are neither bad design nor non-compliant.


1 Answers

An int[] is a lot more efficient in terms of storage it requires. Consider the following query which returns the size of an array with 500 elements

select pg_column_size(array_agg(i)) as array_size, 
       pg_column_size(jsonb_agg(i)) as jsonb_size,
       pg_column_size(json_agg(i)) as json_size
from  generate_series(1,500) i;

returns:

array_size | jsonb_size | json_size
-----------+------------+----------
      2024 |       6008 |      2396

(I am quite surprised that the JSON value is so much smaller than the JSONB, but that's a different topic)


If you always use the array as a single value it does not really matter in terms of query performance But if you do need to look into the array and search for specific value(s), that will be a lot more efficient with a native array.

There are a lot more functions and operators available for native arrays than there are for JSON arrays. You can easily search for a single value in a JSON array, but searching for multiple values requires workarounds.

The following query demonstrates that:

with array_test (id, int_array, json_array) as (
  values
    (1, array[1,2,3], '[1,2,3]'::jsonb)
)
select id, 
       int_array @> array[1] as array_single,
       json_array @> '1' json_single,
       int_array @> array[1,2] as array_all,
       json_array ?& array['1','2'] as json_all,
       int_array && array[1,2] as array_any,
       json_array ?| array['1','2'] as json_any
from array_test;

You can easily query an array if it contains one specific value. This also works for JSON arrays. Those are the expressions array_single and json_single. With a native array you could also use 1 = any(int_array) instead.

But check if an array contains all values from a list, or any value from a list does not work with JSON arrays.

The above test query returns:

id | array_single | json_single | array_all | json_all | array_any | json_any
---+--------------+-------------+-----------+----------+-----------+---------
 1 | true         | true        | true      | false    | true      | false   
like image 129
a_horse_with_no_name Avatar answered Sep 22 '22 05:09

a_horse_with_no_name