Why is my PostgreSQL count so slow?

The reason why this is slow is related to the MVCC implementation in PostgreSQL. The fact that multiple transactions can see different states of the data means that there can be no straightforward way for "COUNT (*)" to summarize data across the whole table; PostgreSQL must walk through all rows, in some sense.

How to COUNT DISTINCT values in a database?

To show how to count distinct values, we will use the following table: At the end of this article you can find database preparation SQL queries. In this example, we will count the number of unique countries in users table. In this example, we will count the number of users in every (unique) country in descending order.

Why is a full count of rows in a table so slow?

A full count of rows in a table can be comparatively slow performing in PostgreSQL, typically using this SQL: The reason why this is slow is related to the MVCC implementation in PostgreSQL.

postgresql COUNT(DISTINCT ...) very slow

Q: How to speed up count(distinct) query?

I tried including my query plan, but it won't fit in the comment box. If your count (distinct (x)) is significantly slower than count (x) then you can speed up this query by maintaining x value counts in different table, for example table_name_x_counts (x integer not null, x_count int not null), using triggers.

People also ask

Why is Count distinct so slow?

It's slow because the database is iterating over all the logs and all the dashboards, then joining them, then sorting them, all before getting down to real work of grouping and aggregating.

Does distinct make query slow?

Very few queries may perform faster in SELECT DISTINCT mode, and very few will perform slower (but not significantly slower) in SELECT DISTINCT mode but for the later case it is likely that the application may need to examine the duplicate cases, which shifts the performance and complexity burden to the application.

How do I get unique values in PostgreSQL?

Removing duplicate rows from a query result set in PostgreSQL can be done using the SELECT statement with the DISTINCT clause. It keeps one row for each group of duplicates. The DISTINCT clause can be used for a single column or for a list of columns.

How do I count distinct rows in SQL?

The syntax of the SQL COUNT function:COUNT ([ALL | DISTINCT] expression); By default, SQL Server Count Function uses All keyword. It means that SQL Server counts all records in a table. It also includes the rows having duplicate values as well.

You can use this:

SELECT COUNT(*) FROM (SELECT DISTINCT column_name FROM table_name) AS temp;

This is much faster than:

COUNT(DISTINCT column_name)

-- My default settings (this is basically a single-session machine, so work_mem is pretty high)
SET effective_cache_size='2048MB';
SET work_mem='16MB';

\echo original
EXPLAIN ANALYZE
SELECT
        COUNT (distinct val) as aantal
FROM one
        ;

\echo group by+count(*)
EXPLAIN ANALYZE
SELECT
        distinct val
       -- , COUNT(*)
FROM one
GROUP BY val;

\echo with CTE
EXPLAIN ANALYZE
WITH agg AS (
    SELECT distinct val
    FROM one
    GROUP BY val
    )
SELECT COUNT (*) as aantal
FROM agg
        ;

Results:

original                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=36448.06..36448.07 rows=1 width=4) (actual time=1766.472..1766.472 rows=1 loops=1)
   ->  Seq Scan on one  (cost=0.00..32698.45 rows=1499845 width=4) (actual time=31.371..185.914 rows=1499845 loops=1)
 Total runtime: 1766.642 ms
(3 rows)

group by+count(*)
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=36464.31..36477.31 rows=1300 width=4) (actual time=412.470..412.598 rows=1300 loops=1)
   ->  HashAggregate  (cost=36448.06..36461.06 rows=1300 width=4) (actual time=412.066..412.203 rows=1300 loops=1)
         ->  Seq Scan on one  (cost=0.00..32698.45 rows=1499845 width=4) (actual time=26.134..166.846 rows=1499845 loops=1)
 Total runtime: 412.686 ms
(4 rows)

with CTE
                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=36506.56..36506.57 rows=1 width=0) (actual time=408.239..408.239 rows=1 loops=1)
   CTE agg
     ->  HashAggregate  (cost=36464.31..36477.31 rows=1300 width=4) (actual time=407.704..407.847 rows=1300 loops=1)
           ->  HashAggregate  (cost=36448.06..36461.06 rows=1300 width=4) (actual time=407.320..407.467 rows=1300 loops=1)
                 ->  Seq Scan on one  (cost=0.00..32698.45 rows=1499845 width=4) (actual time=24.321..165.256 rows=1499845 loops=1)
       ->  CTE Scan on agg  (cost=0.00..26.00 rows=1300 width=0) (actual time=407.707..408.154 rows=1300 loops=1)
     Total runtime: 408.300 ms
    (7 rows)

The same plan as for the CTE could probably also be produced by other methods (window functions)

If your count(distinct(x)) is significantly slower than count(x) then you can speed up this query by maintaining x value counts in different table, for example table_name_x_counts (x integer not null, x_count int not null), using triggers. But your write performance will suffer and if you update multiple x values in single transaction then you'd need to do this in some explicit order to avoid possible deadlock.

Related questions
                            
                                Why do people say that Ruby is slow? [closed]
                            
                                Is memcached a dinosaur in comparison to Redis? [closed]
                            
                                How are 3D games so efficient? [closed]
                            
                                Python: List vs Dict for look up table
                            
                                Java Reflection Performance
                            
                                How to quickly clear a JavaScript Object?
                            
                                How to check which locks are held on a table
                            
                                Why is creating a Thread said to be expensive?
                            
                                It is more efficient to use if-return-return or if-else-return?
                            
                                Check if property has attribute
                            
                                Why does appending "" to a String save memory?
                            
                                Is it better to use std::memcpy() or std::copy() in terms to performance?
                            
                                What is the effect of ordering if...else if statements by probability?
                            
                                Should try...catch go inside or outside a loop?
                            
                                Is there a REAL performance difference between INT and VARCHAR primary keys?
                            
                                Is inline assembly language slower than native C++ code?
                            
                                Unexpected outcome of node.js vs ASP.NET Core performance test
                            
                                SQL JOIN vs IN performance?
                            
                                Is there a performance difference between a for loop and a for-each loop?
                            
                                Why is MATLAB so fast in matrix multiplication?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

postgresql COUNT(DISTINCT ...) very slow

Tags:

performance

postgresql

count

distinct

People also ask

Recent Activity

Donate For Us