Or better said: When to use array as a field data type in a table? Which solution provides better search results?

I avoid arrays for 2 reasons: <ul> <li>by storing more than one attribute value in a cell you violate the first normal form (theoretical);</li> <li>you have to perform some extra, non-SQL related, processing each time you need to work with individual elements of the arrays (practical, but a direct consequence of the theoretical one)</li> </ul>

Don't know how long these links stay live so I'll paste the results below: http://sqlfiddle.com/#!17/55761/2 TLDR; searching a table index and then joining is fast, BUT adding a GIN index (using gin__int_ops) to a single table with an array column can be faster. Additionally, the flexibility of being able to match "some" or a small number of your array values might be a better option e.g. a tagging system. <pre class="prettyprint"><code>create table data ( id serial primary key, tags int[], data jsonb ); create table tags ( id serial primary key, data_id int references data(id) ); CREATE INDEX gin_tags ON data USING GIN(tags gin__int_ops); SET enable_seqscan to off; with rand as (SELECT generate_series(1,100000) AS id) insert into data (tags) select '{5}' from rand; update data set tags = '{1}' where id = 47300; with rand as (SELECT generate_series(1,100000) AS id) INSERT INTO tags(data_id) select id from rand; </code></pre> Running: <pre class="prettyprint"><code> select data.id, data.data, data.tags from data, tags where tags.data_id = data.id and tags.id = 47300; </code></pre> and <pre class="prettyprint"><code> select data.id, data.data, data.tags from data where data.tags && '{1}'; </code></pre> Yields: <pre class="prettyprint"><code>Record Count: 1; Execution Time: 3ms QUERY PLAN Nested Loop (cost=0.58..16.63 rows=1 width=61) -> Index Scan using tags_pkey on tags (cost=0.29..8.31 rows=1 width=4) Index Cond: (id = 47300) -> Index Scan using data_pkey on data (cost=0.29..8.31 rows=1 width=61) Index Cond: (id = tags.data_id) </code></pre> and <pre class="prettyprint"><code>Record Count: 1; Execution Time: 1ms QUERY PLAN Bitmap Heap Scan on data (cost=15.88..718.31 rows=500 width=61) Recheck Cond: (tags && '{1}'::integer[]) -> Bitmap Index Scan on gin_tags (cost=0.00..15.75 rows=500 width=0) Index Cond: (tags && '{1}'::integer[]) </code></pre>

Is it better to use tables instead of arrays field type in PostgreSql when arrays do not exceed 50 elements?

3 Answers

I avoid arrays for 2 reasons:

by storing more than one attribute value in a cell you violate the first normal form (theoretical);
you have to perform some extra, non-SQL related, processing each time you need to work with individual elements of the arrays (practical, but a direct consequence of the theoretical one)

163

answered Oct 16 '22 20:10

Milen A. Radev

I've considered this problem as well and the conclusion that I came to, is to use arrays when you want to eliminate table joins. The number of elements contained in each array isn't as important as the size of the tables involved. If there are only a few thousand rows in each table, then joining to get the 50 sub rows shouldn't be a big problem. If you get into 10's or 100's of thousands or rows, you're likely to start chewing through a lot of processor time and disk i/o though.

answered Oct 16 '22 20:10

Dana the Sane

Don't know how long these links stay live so I'll paste the results below: http://sqlfiddle.com/#!17/55761/2

TLDR; searching a table index and then joining is fast, BUT adding a GIN index (using gin__int_ops) to a single table with an array column can be faster. Additionally, the flexibility of being able to match "some" or a small number of your array values might be a better option e.g. a tagging system.

create table data (
    id serial primary key,
    tags int[],
    data jsonb
);

create table tags (
    id serial primary key,
    data_id int references data(id)
);

CREATE INDEX gin_tags ON data USING GIN(tags gin__int_ops); 

SET enable_seqscan to off;

with rand as (SELECT generate_series(1,100000) AS id)
insert into data (tags) select '{5}' from rand;

update data set tags = '{1}' where id = 47300;

with rand as (SELECT generate_series(1,100000) AS id)
INSERT INTO tags(data_id) select id from rand;

Running:

  select data.id, data.data, data.tags
  from data, tags where tags.data_id = data.id and tags.id = 47300;

and

  select data.id, data.data, data.tags
  from data where data.tags && '{1}';

Yields:

Record Count: 1; Execution Time: 3ms
QUERY PLAN
Nested Loop (cost=0.58..16.63 rows=1 width=61)
-> Index Scan using tags_pkey on tags (cost=0.29..8.31 rows=1 width=4)
Index Cond: (id = 47300)
-> Index Scan using data_pkey on data (cost=0.29..8.31 rows=1 width=61)
Index Cond: (id = tags.data_id)

and

Record Count: 1; Execution Time: 1ms
QUERY PLAN
Bitmap Heap Scan on data (cost=15.88..718.31 rows=500 width=61)
Recheck Cond: (tags && '{1}'::integer[])
-> Bitmap Index Scan on gin_tags (cost=0.00..15.75 rows=500 width=0)
Index Cond: (tags && '{1}'::integer[])

answered Oct 16 '22 19:10

mattdlockyer

Related questions
                            
                                Peek ahead when iterating an array in PHP
                            
                                ZSH for loop array variable issue
                            
                                Replace NaN's in NumPy array with closest non-NaN value
                            
                                Sorting a Swift array by ordering from another array
                            
                                Splitting an array into 2 arrays C#
                            
                                PHP - Merge two arrays (same-length) into one associative?
                            
                                Get all elements in array besides the first one.. ? (php)
                            
                                Creating Array using JSTL or EL
                            
                                Python/NumPy first occurrence of subarray
                            
                                Sampling a random subset from an array
                            
                                Best way to convert string to array of object in javascript?
                            
                                how to do a "flat push" in javascript?
                            
                                Appending string to Matlab array
                            
                                Swift: Recursively cycle through all subviews to find a specific class and append to an array
                            
                                PHP Error: Cannot use object of type stdClass as array (array and object issues) [duplicate]
                            
                                Flipping zeroes and ones in one-dimensional NumPy array
                            
                                Helvetica Neue Light,iOS
                            
                                SimpleXMLElement to PHP Array [duplicate]
                            
                                Overload resolution and arrays: which function should be called?
                            
                                How to optimally divide an array into two subarrays so that sum of elements in both are same, otherwise give an error?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it better to use tables instead of arrays field type in PostgreSql when arrays do not exceed 50 elements?

Tags:

arrays

postgresql

Florin

People also ask

3 Answers

Milen A. Radev

Dana the Sane

mattdlockyer

Recent Activity

Donate For Us