Summary & Question
I recently asked a Postgres-related question on SO and got a couple of different answers with queries that helped get me the result I wanted. Both of them worked, getting the correct result, and both had very similar code. One line was different, though: one answer used a WHERE clause to pick out something I'd asked for, and the other used FILTER (WHERE ...). What's the difference between these two?
Background Detail
For the curious, the table I had asked about is this one:
+--+---------+-------+
|id|treatment|outcome|
+--+---------+-------+
|a |1 |0 |
|a |1 |1 |
|b |0 |1 |
|c |1 |0 |
|c |0 |1 |
|c |1 |1 |
+--+---------+-------+
I'd wanted something that looks like this:
+-----------------------+-----+
|ever treated |count|
+-----------------------+-----+
|0 |1 |
|1 |3 |
+-----------------------+-----+
And I got two answers that worked. Here's the first, from @ErwinBrandstetter:
SELECT ever_treated, sum(outcome_ct) AS count
FROM (
SELECT id,
max(treatment) AS ever_treated,
count(*) FILTER (WHERE outcome = 1) AS outcome_ct
FROM t
GROUP BY 1
) sub
GROUP BY 1;
And here's @Heidiki with the second:
select subq.ever_treated, sum(subq.count) as count
from (select id,
max(treatment) as ever_treated,
count(*) as count from t where outcome = 1
group by id) as subq
group by subq.ever_treated;
To my (admittedly novice) eye, the main difference between the two is that in the former, you see this:
count(*) FILTER (WHERE outcome = 1) AS outcome_ct
While in the latter, you have this:
count(*) as count from t where outcome = 1
I'm looking through some of the documentation and I see that FILTER is working at the aggregate level, while perhaps WHERE is not, but I'm still lost on the intuition, especially as it applies to my table.
So what's the difference here?
The subqueries are not equivalent:
SELECT
id,
max(treatment) AS ever_treated,
count(*) FILTER (WHERE outcome = 1) AS outcome_ct
FROM t
GROUP BY 1
t.treatment for every id in the whole table t.outcome = 1 for every id.select
id,
max(treatment) as ever_treated,
count(*) as count
from t
where outcome = 1
group by id
t where outcome = 1.treatment for every id in the subset of the table t.id in the subset (already filtered).In terms of the result only #2 and #3 are significant. As you can see #3 will be equivalent, but #2 won't be.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With