Consider a voting system implemented in PostgreSQL, where each user can vote up or down on a "foo". There is a foo
table that stores all the "foo information", and a votes
table that stores the user_id
, foo_id
, and vote
, where vote
is +1 or -1.
To get the vote tally for each foo, the following query would work:
SELECT sum(vote) FROM votes WHERE foo.foo_id = votes.foo_id;
But, the following would work just as well:
(SELECT count(vote) FROM votes
WHERE foo.foo_id = votes.foo_id
AND votes.vote = 1)
- (SELECT count(vote) FROM votes
WHERE foo.foo_id = votes.foo_id
AND votes.vote = (-1))
I currently have an index on votes.foo_id
.
Which is a more efficient approach? (In other words, which would run faster?) I'm interested in both the PostgreSQL-specific answer and the general SQL answer.
EDIT
A lot of answers have been taking into account the case where vote
is null. I forgot to mention that there is a NOT NULL
constraint on the vote column.
Also, many have been pointing out that the first is much easier to read. Yes, it is definitely true, and if a colleague wrote the 2nd one, I would be exploding with rage unless there was a performance necessity. Never the less, the question is still on the performance of the two. (Technically, if the first query was way slower, it wouldn't be such a crime to write the second query.)
COUNT() is used to count the number of rows for a given condition. COUNT() works on numeric as well as non-numeric values. SUM() is used to calculate the total sum of all values in the specified numeric column.
Sum is doing the mathematical sum, whereas count simply counts any value as 1 regardless of what data type.
Question: What is Faster, SUM or COUNT? Answer: Both are the same.
The COUNT() function is used to return the number of rows which satisfy a certain condition. The SUM() function is used to return the sum of numerical values in a column in the table. The NULL values are ignored.
Of course, the first example is faster, simpler and easier to read. Should be obvious even before one gets slapped with aquatic creatures. While sum()
is slightly more expensive than count()
, what matters much, much more is that the second example need two scans.
But there is an actual difference, too: sum()
can return NULL
where count()
doesn't. I quote the manual on aggregate functions:
It should be noted that except for count, these functions return a null value when no rows are selected. In particular, sum of no rows returns null, not zero as one might expect,
Since you seem to have a weak spot for performance optimization, here's a detail you might like: count(*)
is slightly faster than count(vote)
. Only equivalent if vote is NOT NULL
. Test performance with EXPLAIN ANALYZE
.
Both queries are syntactical nonsense, standing alone. It only makes sense if you copied them from the SELECT
list of a bigger query like:
SELECT *, (SELECT sum(vote) FROM votes WHERE votes.foo_id = foo.foo_id)
FROM foo;
The important point here is the correlated subquery - which may be fine if you are only reading a small fraction of votes
in your query. We would see additional WHERE
conditions, and you should have matching indexes.
In Postgres 9.3 or later, the alternative, cleaner, 100 % equivalent solution would be with LEFT JOIN LATERAL ... ON true
:
SELECT *
FROM foo f
LEFT JOIN LATERAL (
SELECT sum(vote) FROM votes WHERE foo_id = f.foo_id
) v ON true;
Typically similar performance. Details:
However, while reading large parts or all from table votes
, this will be (much) faster:
SELECT f.*, v.score
FROM foo f
JOIN (
SELECT foo_id, sum(vote) AS score
FROM votes
GROUP BY 1
) v USING (foo_id);
Aggregate values in a subquery first, then join to the result.
About USING
:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With