Optimization of count query for PostgreSQL

Tags:

I have a table in postgresql that contains an array which is updated constantly.

In my application i need to get the number of rows for which a specific parameter is not present in that array column. My query looks like this:

select count(id) 
from table 
where not (ARRAY['parameter value'] <@ table.array_column)

But when increasing the amount of rows and the amount of executions of that query (several times per second, possibly hundreds or thousands) the performance decreses a lot, it seems to me that the counting in postgresql might have a linear order of execution (I’m not completely sure of this).

Basically my question is:

Is there an existing pattern I’m not aware of that applies to this situation? what would be the best approach for this?

Any suggestion you could give me would be really appreciated.

378

asked Oct 25 '12 18:10

jeruki

1 Answers

PostgreSQL actually supports GIN indexes on array columns. Unfortunately, it doesn't seem to be usable for NOT ARRAY[...] <@ indexed_col, and GIN indexes are unsuitable for frequently-updated tables anyway.

Demo:

CREATE TABLE arrtable (id integer primary key, array_column integer[]);

INSERT INTO arrtable(1, ARRAY[1,2,3,4]);

CREATE INDEX arrtable_arraycolumn_gin_arr_idx
ON arrtable USING GIN(array_column);

-- Use the following *only* for testing whether Pg can use an index
-- Do not use it in production.
SET enable_seqscan = off;

explain (buffers, analyze) select count(id) 
from arrtable 
where not (ARRAY[1] <@ arrtable.array_column);

Unfortunately, this shows that as written we can't use the index. If you don't negate the condition it can be used, so you can search for and count rows that do contain the search element (by removing NOT).

You could use the index to count entries that do contain the target value, then subtract that result from a count of all entries. Since counting all rows in a table is quite slow in PostgreSQL (9.1 and older) and requires a sequential scan this will actually be slower than your current query. It's possible that on 9.2 an index-only scan can be used to count the rows if you have a b-tree index on id, in which case this might actually be OK:

SELECT (
  SELECT count(id) FROM arrtable
) - (
  SELECT count(id) FROM arrtable 
  WHERE (ARRAY[1] <@ arrtable.array_column)
);

It's guaranteed to perform worse than your original version for Pg 9.1 and below, because in addition to the seqscan your original requires it also needs an GIN index scan. I've now tested this on 9.2 and it does appear to use an index for the count, so it's worth exploring for 9.2. With some less trivial dummy data:

drop index arrtable_arraycolumn_gin_arr_idx ;
truncate table arrtable;
insert into arrtable (id, array_column)
select s, ARRAY[1,2,s,s*2,s*3,s/2,s/4] FROM generate_series(1,1000000) s;
CREATE INDEX arrtable_arraycolumn_gin_arr_idx
ON arrtable USING GIN(array_column);

Note that a GIN index like this will slow updates down a LOT, and is quite slow to create in the first place. It is not suitable for tables that get updated much at all - like your table.

Worse, the query using this index takes up to twice times as long as your original query and at best half as long on the same data set. It's worst for cases where the index is not very selective like ARRAY[1] - 4s vs 2s for the original query. Where the index is highly selective (ie: not many matches, like ARRAY[199]) it runs in about 1.2 seconds vs the original's 3s. This index simply isn't worth having for this query.

The lesson here? Sometimes, the right answer is just to do a sequential scan.

Since that won't do for your hit rates, either maintain a materialized view with a trigger as @debenhur suggests, or try to invert the array to be a list of parameters that the entry does not have so you can use a GiST index as @maniek suggests.

115

answered Oct 13 '22 15:10

Craig Ringer

Related questions
                            
                                Proper full text index Rails/PostgreSQL/pg_search
                            
                                PSQLException: ERROR: null value in column violates not-null constraint
                            
                                JDBC what's the purpose of PreparedStatement#setNull
                            
                                Deadlock involving foreign key constraint
                            
                                SQLAlchemy: update from_select
                            
                                How to properly call PostgreSQL functions (stored procedures) within Spring/Hibernate/JPA?
                            
                                How to Auto Increment Alpha-Numeric value in postgresql?
                            
                                Derived type in PostgreSQL
                            
                                Sortable UUIDs and overriding ActiveRecord::Base
                            
                                connect to postgresql database with different locale
                            
                                @BatchSize a smart or stupid use?
                            
                                How to replace captured group with evaluated expression (adding an integer value to capture group)
                            
                                ERROR: no PostgreSQL user name specified in startup packet
                            
                                How to schedule a continual copy of a database from production to staging on Heroku?
                            
                                What's psycopg2 doing when I iterate a cursor?
                            
                                Nested transactions - Rollback scenario
                            
                                How do you wipe a Postgresql database?
                            
                                How to prevent inserts in the parent table?
                            
                                rails - postgres error: Reason: Incompatible library version: libpq.5.dylib requires version 1.0.0 or later,
                            
                                Replicate selected postgresql tables between two servers?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Optimization of count query for PostgreSQL

Tags:

postgresql

count

database-performance

postgresql-performance

jeruki

People also ask

1 Answers

Craig Ringer

Recent Activity

Donate For Us