Postgresql - performance of using array in big database

Tags:

Let say we have a table with 6 million records. There are 16 integer columns and few text column. It is read-only table so every integer column have an index. Every record is around 50-60 bytes.

The table name is "Item"
The server is: 12 GB RAM, 1,5 TB SATA, 4 CORES. All server for postgres.
There are many more tables in this database so RAM do not cover all database.

I want to add to table "Item" a column "a_elements" (array type of big integers) Every record would have not more than 50-60 elements in this column.

After that i would create index GIN on this column and typical query should look like this:

select * from item where ...... and '{5}' <@ a_elements;

I have also second, more classical, option.

Do not add column a_elements to table item but create table elements with two columns:

id_item
id_element

This table would have around 200 mln records.

I am able to do partitioning on this tables so number of records would reduce to 20 mln in table elements and 500 K in table item.

The second option query looks like this:

select item.*  from item      left join elements on (item.id_item=elements.id_item)  where ....  and 5 = elements.id_element

I wonder what option would be better at performance point of view. Is postgres able to use many different indexes with index GIN (option 1) in a single query ?

I need to make a good decision because import of this data will take me a 20 days.

789

asked Aug 03 '12 08:08

user1573402

1 Answers

I think you should use an elements table:

Postgres would be able to use statistics to predict how many rows will match before executing query, so it would be able to use the best query plan (it is more important if your data is not evenly distributed);
you'll be able to localize query data using CLUSTER elements USING elements_id_element_idx;
when Postgres 9.2 would be released then you would be able to take advantage of index only scans;

But I've made some tests for 10M elements:

create table elements (id_item bigint, id_element bigint); insert into elements   select (random()*524288)::int, (random()*32768)::int     from generate_series(1,10000000);  \timing create index elements_id_item on elements(id_item); Time: 15470,685 ms create index elements_id_element on elements(id_element); Time: 15121,090 ms  select relation, pg_size_pretty(pg_relation_size(relation))   from (     select unnest(array['elements','elements_id_item', 'elements_id_element'])       as relation   ) as _;       relation       | pg_size_pretty  ---------------------+----------------  elements            | 422 MB  elements_id_item    | 214 MB  elements_id_element | 214 MB    create table arrays (id_item bigint, a_elements bigint[]); insert into arrays select array_agg(id_element) from elements group by id_item;  create index arrays_a_elements_idx on arrays using gin (a_elements); Time: 22102,700 ms  select relation, pg_size_pretty(pg_relation_size(relation))   from (     select unnest(array['arrays','arrays_a_elements_idx']) as relation   ) as _;        relation        | pg_size_pretty  -----------------------+----------------  arrays                | 108 MB  arrays_a_elements_idx | 73 MB

So in the other hand arrays are smaller, and have smaller index. I'd do some 200M elements tests before making a decision.

181

answered Sep 30 '22 09:09

Tometzky

Related questions
                            
                                pkg-config and OSX 10.8, proper PKG_CONFIG_PATH? Missing .pc files?
                            
                                Find HTML based on partial attribute
                            
                                why does Files.probeContentType return null
                            
                                Yii users being logged out after 15-30 minutes despite session timeouts being set to at least 1 day
                            
                                How can I order a dataframe by the second column in R? [duplicate]
                            
                                How do I stop a Listening server in Go
                            
                                Why is blur event not fired in iOS Safari Mobile (iPhone / iPad)?
                            
                                SASS Syntax Highlighting in Visual Studio
                            
                                How to run Django tests on Heroku
                            
                                how to access the unittest.main(verbosity) setting in a unittest.TestCase
                            
                                Is there a way to make a variable width font act like a fixed width font in HTML? [duplicate]
                            
                                An exception occurred while invoking executor 'executor://mstestadapter/v1': Object reference not set to an instance of an object

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With