Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it better to use tables instead of arrays field type in PostgreSql when arrays do not exceed 50 elements?

Or better said: When to use array as a field data type in a table?

Which solution provides better search results?

like image 316
Florin Avatar asked Nov 11 '08 03:11

Florin


People also ask

Should you use arrays in Postgres?

When you are considering portability (e.g. rewriting your system to work with other databses) then you must not use arrays. If you are sure you'll stick with Postgres, then you can safely use arrays where you find appropriate. They exist for a reason and are neither bad design nor non-compliant.

How many columns should a Postgres table have?

There is a limit on how many columns a table can contain. Depending on the column types, it is between 250 and 1600.

Is array a data type in PostgreSQL?

PostgreSQL Supports the concept of Arrays. All data type has a companion array associated with it irrespective of the properties of the data type.It is available even for user-defined data types.

Can we store array in PostgreSQL?

PostgreSQL allows columns of a table to be defined as variable-length multidimensional arrays. Arrays of any built-in or user-defined base type, enum type, composite type, range type, or domain can be created.


3 Answers

I avoid arrays for 2 reasons:

  • by storing more than one attribute value in a cell you violate the first normal form (theoretical);
  • you have to perform some extra, non-SQL related, processing each time you need to work with individual elements of the arrays (practical, but a direct consequence of the theoretical one)
like image 163
Milen A. Radev Avatar answered Oct 16 '22 20:10

Milen A. Radev


I've considered this problem as well and the conclusion that I came to, is to use arrays when you want to eliminate table joins. The number of elements contained in each array isn't as important as the size of the tables involved. If there are only a few thousand rows in each table, then joining to get the 50 sub rows shouldn't be a big problem. If you get into 10's or 100's of thousands or rows, you're likely to start chewing through a lot of processor time and disk i/o though.

like image 44
Dana the Sane Avatar answered Oct 16 '22 20:10

Dana the Sane


Don't know how long these links stay live so I'll paste the results below: http://sqlfiddle.com/#!17/55761/2

TLDR; searching a table index and then joining is fast, BUT adding a GIN index (using gin__int_ops) to a single table with an array column can be faster. Additionally, the flexibility of being able to match "some" or a small number of your array values might be a better option e.g. a tagging system.

create table data (
    id serial primary key,
    tags int[],
    data jsonb
);

create table tags (
    id serial primary key,
    data_id int references data(id)
);

CREATE INDEX gin_tags ON data USING GIN(tags gin__int_ops); 

SET enable_seqscan to off;

with rand as (SELECT generate_series(1,100000) AS id)
insert into data (tags) select '{5}' from rand;

update data set tags = '{1}' where id = 47300;

with rand as (SELECT generate_series(1,100000) AS id)
INSERT INTO tags(data_id) select id from rand;

Running:

  select data.id, data.data, data.tags
  from data, tags where tags.data_id = data.id and tags.id = 47300;

and

  select data.id, data.data, data.tags
  from data where data.tags && '{1}';

Yields:

Record Count: 1; Execution Time: 3ms
QUERY PLAN
Nested Loop (cost=0.58..16.63 rows=1 width=61)
-> Index Scan using tags_pkey on tags (cost=0.29..8.31 rows=1 width=4)
Index Cond: (id = 47300)
-> Index Scan using data_pkey on data (cost=0.29..8.31 rows=1 width=61)
Index Cond: (id = tags.data_id)

and

Record Count: 1; Execution Time: 1ms
QUERY PLAN
Bitmap Heap Scan on data (cost=15.88..718.31 rows=500 width=61)
Recheck Cond: (tags && '{1}'::integer[])
-> Bitmap Index Scan on gin_tags (cost=0.00..15.75 rows=500 width=0)
Index Cond: (tags && '{1}'::integer[])
like image 5
mattdlockyer Avatar answered Oct 16 '22 19:10

mattdlockyer