Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I speed up counting rows in a PostgreSQL table?

We need to count the number of rows in a PostgreSQL table. In our case, no conditions need to be met, and it would be perfectly acceptable to get a row estimate if that significantly improved query speed.

Basically, we want select count(id) from <table> to run as fast as possible, even if that implies not getting exact results.

like image 815
Juan Carlos Coto Avatar asked Jan 28 '13 20:01

Juan Carlos Coto


People also ask

How do you make a count query faster?

So to make SELECT COUNT(*) queries fast, here's what to do:Get on any version that supports batch mode on columnstore indexes, and put a columnstore index on the table – although your experiences are going to vary dramatically depending on the kind of query you have.

How fast is count Postgres?

By default, count query counts everything, including duplicates. Let's touch upon distinct , which is often used alongside count. This command uses an index-only scan but still takes around 3.5 seconds. The speed depends on many factors, including cardinality, size of the table, and whether the index is cached.

How do I count rows in PostgreSQL?

The basic SQL standard query to count the rows in a table is: SELECT count(*) FROM table_name; This can be rather slow because PostgreSQL has to check visibility for all rows, due to the MVCC model.

How make PostgreSQL query run faster?

Some of the tricks we used to speed up SELECT-s in PostgreSQL: LEFT JOIN with redundant conditions, VALUES, extended statistics, primary key type conversion, CLUSTER, pg_hint_plan + bonus.


2 Answers

For a very quick estimate:

SELECT reltuples FROM pg_class WHERE relname = 'my_table'; 

There are several caveats, though. For one, relname is not necessarily unique in pg_class. There can be multiple tables with the same relname in multiple schemas of the database. To be unambiguous:

SELECT reltuples::bigint FROM pg_class WHERE oid = 'my_schema.my_table'::regclass; 

If you do not schema-qualify the table name, a cast to regclass observes the current search_path to pick the best match. And if the table does not exist (or cannot be seen) in any of the schemas in the search_path you get an error message. See Object Identifier Types in the manual.

The cast to bigint formats the real number nicely, especially for big counts.

Also, reltuples can be more or less out of date. There are ways to make up for this to some extent. See this later answer with new and improved options:

  • Fast way to discover the row count of a table in PostgreSQL

And a query on pg_stat_user_tables is many times slower (though still much faster than full count), as that's a view on a couple of tables.

like image 113
Erwin Brandstetter Avatar answered Sep 22 '22 17:09

Erwin Brandstetter


Count is slow for big tables, so you can get a close estimate this way:

SELECT reltuples::bigint AS estimate  FROM pg_class  WHERE relname='tableName'; 

and its extremely fast, results are not float, but still a close estimate.

  • reltuples is a column from pg_class table, it holds data about "number of rows in the table. This is only an estimate used by the planner. It is updated by VACUUM, ANALYZE, and a few DDL commands such as CREATE INDEX" (manual)
  • The catalog pg_class catalogs tables and most everything else that has columns or is otherwise similar to a table. This includes indexes (but see also pg_index), sequences, views, composite types, and some kinds of special relation (manual)
  • "Why is "SELECT count(*) FROM bigtable;" slow?" : http://wiki.postgresql.org/wiki/FAQ#Why_is_.22SELECT_count.28.2A.29_FROM_bigtable.3B.22_slow.3F
like image 44
Ariel Grabijas Avatar answered Sep 22 '22 17:09

Ariel Grabijas