Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between WHERE and FILTER (WHERE) in PostgreSQL?

Summary & Question

I recently asked a Postgres-related question on SO and got a couple of different answers with queries that helped get me the result I wanted. Both of them worked, getting the correct result, and both had very similar code. One line was different, though: one answer used a WHERE clause to pick out something I'd asked for, and the other used FILTER (WHERE ...). What's the difference between these two?

Background Detail

For the curious, the table I had asked about is this one:

+--+---------+-------+
|id|treatment|outcome|
+--+---------+-------+
|a |1        |0      |
|a |1        |1      |
|b |0        |1      |
|c |1        |0      |
|c |0        |1      |
|c |1        |1      |
+--+---------+-------+

I'd wanted something that looks like this:

+-----------------------+-----+
|ever treated           |count|
+-----------------------+-----+
|0                      |1    |
|1                      |3    |
+-----------------------+-----+

And I got two answers that worked. Here's the first, from @ErwinBrandstetter:

SELECT ever_treated, sum(outcome_ct) AS count
FROM  (
   SELECT id, 
          max(treatment) AS ever_treated, 
          count(*) FILTER (WHERE outcome = 1) AS outcome_ct
   FROM t
   GROUP  BY 1
   ) sub
GROUP  BY 1;

And here's @Heidiki with the second:

    select subq.ever_treated, sum(subq.count) as count
    from (select id, 
          max(treatment) as ever_treated, 
          count(*) as count from t where outcome = 1 
          group by id) as subq 
    group by subq.ever_treated;

To my (admittedly novice) eye, the main difference between the two is that in the former, you see this:

count(*) FILTER (WHERE outcome = 1) AS outcome_ct

While in the latter, you have this:

count(*) as count from t where outcome = 1

I'm looking through some of the documentation and I see that FILTER is working at the aggregate level, while perhaps WHERE is not, but I'm still lost on the intuition, especially as it applies to my table.

So what's the difference here?

like image 662
logjammin Avatar asked Nov 01 '25 22:11

logjammin


1 Answers

The subqueries are not equivalent:

The first one

SELECT
  id, 
  max(treatment) AS ever_treated, 
  count(*) FILTER (WHERE outcome = 1) AS outcome_ct
FROM t
GROUP BY 1
  1. Reads all the rows from t.
  2. Computes the max treatment for every id in the whole table t.
  3. Count the rows with outcome = 1 for every id.

The second one

select
  id, 
  max(treatment) as ever_treated, 
  count(*) as count 
from t
where outcome = 1 
group by id
  1. Reads the subset of rows from t where outcome = 1.
  2. Computes the max treatment for every id in the subset of the table t.
  3. Count the rows for every id in the subset (already filtered).

In terms of the result only #2 and #3 are significant. As you can see #3 will be equivalent, but #2 won't be.

like image 185
The Impaler Avatar answered Nov 04 '25 01:11

The Impaler



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!