I am attempting to use multiple aggregate functions across multiple tables in a single SQL query (using Postgres).
My table is structured similar to the following:
CREATE TABLE user (user_id INT PRIMARY KEY, user_date_created TIMESTAMP NOT NULL);
CREATE TABLE item_sold (item_sold_id INT PRIMARY KEY, sold_user_id INT NOT NULL);
CREATE TABLE item_bought (item_bought_id INT PRIMARY KEY, bought_user_id INT NOT NULL);
I want to count the number of items bought and sold for each user. The solution I thought up does not work:
SELECT user_id, COUNT(item_sold_id), COUNT(item_bought_id)
FROM user
LEFT JOIN item_sold ON sold_user_id=user_id
LEFT JOIN item_bought ON bought_user_id=user_id
WHERE user_date_created > '2014-01-01'
GROUP BY user_id;
That seems to perform all the combinations of (item_sold_id, item_bought_id), e.g. if there are 4 sold and 2 bought, both COUNT()s are 8.
How can I properly query the table to obtain both counts?
The easy fix to your query is to use distinct
:
SELECT user_id, COUNT(distinct item_sold_id), COUNT(distinct item_bought_id)
FROM user
LEFT JOIN item_sold ON sold_user_id=user_id
LEFT JOIN item_bought ON bought_user_id=user_id
WHERE user_date_created > '2014-01-01'
GROUP BY user_id;
However, the query is doing unnecessary work. If someone has 100 items bought and 200 items sold, then the join produces 20,000 intermediate rows. That is a lot.
The solution is to pre-aggregate the results or use a correlated subquery in the select
. In this case, I prefer the correlated subquery solution (assuming the right indexes are available):
SELECT u.user_id,
(select count(*) from item_sold s where u.user_id = s.sold_user_id),
(select count(*) from item_bought b where u.user_id = b.bought_user_id)
FROM user u
WHERE u.user_date_created > '2014-01-01';
The right indexes are item_sold(sold_user_id)
and item_bought(bought_user_id)
. I prefer this over pre-aggregation because of the filtering on the user
table. This only does the calculations for users created this year -- that is harder to do with pre-aggregation.
SQL Fiddle
With a lateral join it is possible to pre aggregate only the filtered users
select user_id, total_item_sold, total_item_bought
from
"user" u
left join lateral (
select sold_user_id, count(*) as total_item_sold
from item_sold
where sold_user_id = u.user_id
group by sold_user_id
) item_sold on user_id = sold_user_id
left join lateral (
select bought_user_id, count(*) as total_item_bought
from item_bought
where bought_user_id = u.user_id
group by bought_user_id
) item_bought on user_id = bought_user_id
where u.user_date_created >= '2014-01-01'
Notice that you need >=
in the filter otherwise it is possible to miss the exact first moment of the year. Although that timestamp is unlikely with naturally entered data, it is common with an automated job.
Another way to solve this problem is to use two nested selects.
select user_id,
(select count(*) from item_sold where sold_user_id = user_id),
(select count(*) from item_bought where bought_user_id = user_id)
from user
where user_date_created > '2014-01-01'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With