Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can aggregate filter expressions not use indices?

Tags:

sql

postgresql

One cool thing about filter expressions is that you can do multiple different filters and aggregates in one query. The "where" part becomes part of the aggregation rather than the whole "where" clause.

For example:

SELECT count('id') FILTER (WHERE account_type=1) as regular,
       count('id') FILTER (WHERE account_type=2) as gold,
       count('id') FILTER (WHERE account_type=3) as platinum
FROM clients;

(from the Django documentation)

Either this is a bug in PostgreSQL 9.5 or I'm not doing it right, or it's simply a limitation of PostgreSQL.

Consider these two queries:

select count(*)
from main_search
where created >= '2017-10-12T00:00:00.081739+00:00'::timestamptz
and created < '2017-10-13T00:00:00.081739+00:00'::timestamptz
and parent_id is null;

select
count('id') filter (
where created >= '2017-10-12T00:00:00.081739+00:00'::timestamptz
and created < '2017-10-13T00:00:00.081739+00:00'::timestamptz
and parent_id is null) as count
from main_search;

(The main_search table has a combined btree index on created and parent_id is null)

Here's the output:

 count
-------
  9682
(1 row)

 count
-------
  9682
(1 row)

If I stick a explain analyze in front of each query, this is the output:

    QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1174.04..1174.05 rows=1 width=0) (actual time=5.077..5.077 rows=1 loops=1)
   ->  Index Scan using main_search_created_parent_id_null_idx on main_search  (cost=0.43..1152.69 rows=8540 width=0) (actual time=0.026..4.384 rows=9682 loops=1)
         Index Cond: ((created >= '2017-10-11 20:00:00.081739-04'::timestamp with time zone) AND (created < '2017-10-12 20:00:00.081739-04'::timestamp with time zone))
 Planning time: 0.826 ms
 Execution time: 5.227 ms
(5 rows)

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=178054.93..178054.94 rows=1 width=12) (actual time=1589.006..1589.007 rows=1 loops=1)
   ->  Seq Scan on main_search  (cost=0.00..146459.39 rows=4212739 width=12) (actual time=0.051..882.099 rows=4212818 loops=1)
 Planning time: 0.051 ms
 Execution time: 1589.070 ms
(4 rows)

NOTE! The filter expression SELECT statement always use a sec scan instead of an index scan :<

I've tried this too with another PostgreSQL 9.5 table in a different database. At first I thought the "Seq Scan" happened because the table had too few rows but both tables are huge enough that an index should kick in.

like image 674
Peter Bengtsson Avatar asked Jan 12 '18 20:01

Peter Bengtsson


Video Answer


1 Answers

You misunderstud the usage case. Filter affect only aggregate on PRODUCED ALREADY DATASET. It do not filter records.

consider modified example:

SELECT count(*) FILTER (WHERE account_type=1) as regular,
       count(*) FILTER (WHERE account_type=2) as gold,
       count(*) FILTER (WHERE account_type=3) as platinum,
       count(*) 
FROM clients;

how where clasue should be then?

WHERE
(WHERE account_type=3)
or
(WHERE account_type=2)
or
(WHERE account_type=1)
or 1=1 ???

think about much more complicated FILTER and combination of unfiltered columns. It will be nightmare to the optimizer.

When you think about FILTER consider that this is only shortcut for longer sentence like CASE

SELECT SUM(CASE WHEN account_type=1 THEN 1 ELSE 0 END) as regular,
       SUM(CASE WHEN account_type=2 THEN 1 ELSE 0 END) as gold,
       SUM(CASE WHEN account_type=3 THEN 1 ELSE 0 END) as platinum
FROM clients;
like image 135
Livius Avatar answered Oct 06 '22 00:10

Livius