Using outer query result in a subquery in postgresql

Question

I have two tables points and contacts and I'm trying to get the average points.score per contact grouped on a monthly basis. Note that points and contacts aren't related, I just want the sum of points created in a month divided by the number of contacts that existed in that month.

So, I need to sum points grouped by the created_at month, and I need to take the count of contacts FOR THAT MONTH ONLY. It's that last part that's tricking me up. I'm not sure how I can use a column from an outer query in the subquery. I tried something like this:

SELECT SUM(score) AS points_sum,
  EXTRACT(month FROM created_at) AS month,
  date_trunc('MONTH', created_at) + INTERVAL '1 month' AS next_month,
  (SELECT COUNT(id) FROM contacts WHERE contacts.created_at <= next_month) as contact_count
FROM points
GROUP BY month, next_month
ORDER BY month

So, I'm extracting the actual month that my points are being summed, and at the same time, getting the beginning of the next_month so that I can say "Get me the count of contacts where their created at is < next_month"

But it complains that column next_month doesn't exist This is understandable as the subquery knows nothing about the outer query. Qualifying with points.next_month doesn't work either.

So can someone point me in the right direction of how to achieve this?

Tables:

Points

score | created_at
10    | "2011-11-15 21:44:00.363423"
11    | "2011-10-15 21:44:00.69667" 
12    | "2011-09-15 21:44:00.773289"
13    | "2011-08-15 21:44:00.848838"
14    | "2011-07-15 21:44:00.924152"

Contacts

id | created_at
6  | "2011-07-15 21:43:17.534777"
5  | "2011-08-15 21:43:17.520828"
4  | "2011-09-15 21:43:17.506452"
3  | "2011-10-15 21:43:17.491848"
1  | "2011-11-15 21:42:54.759225"

sum, month and next_month (without the subselect)

sum | month | next_month
14  | 7     | "2011-08-01 00:00:00"
13  | 8     | "2011-09-01 00:00:00"
12  | 9     | "2011-10-01 00:00:00"
11  | 10    | "2011-11-01 00:00:00"
10  | 11    | "2011-12-01 00:00:00"

Erwin Brandstetter · Accepted Answer

Edit

Now with running sum of contacts. My first draft used new contacts per month, which is obviously not what OP wants.

WITH c AS (
    SELECT created_at
          ,count(id) OVER (order BY created_at) AS ct
    FROM   contacts
    ), p AS (
    SELECT date_trunc('month', created_at) AS month
          ,sum(score) AS points_sum
    FROM   points
    GROUP  BY 1
    )
SELECT p.month
      ,EXTRACT(month FROM p.month) AS month_nr
      ,p.points_sum
      ,( SELECT c.ct
         FROM   c
         WHERE  c.created_at < (p.month + interval '1 month')
         ORDER  BY c.created_at DESC
         LIMIT  1) AS contacts
FROM   p
ORDER  BY 1

This works for any number of months across the years.
Assumes that no month is missing in the table points. If you want all months, including missing ones in points, generate a list of months with generate_series() and LEFT JOIN to it.
Build a running sum in a CTE with a window function.
Both CTE are not strictly necessary - for performance and simplification only.
Get contacts_count in a subselect.

Your original form of the query could work like this:

SELECT month
      ,EXTRACT(month FROM month) AS month_nr
      ,points_sum
      ,(SELECT count(*)
        FROM   contacts c
        WHERE  c.created_at < (p.month + interval '1 month')) AS contact_count
FROM   (
    SELECT date_trunc('MONTH', created_at) AS month
          ,sum(score) AS points_sum
    FROM   points p
    GROUP  BY 1
    ) p
ORDER  BY 1

The fix for the immediate cause of your error is to put the aggregate into a subquery. You were mixing levels in a way that is impossible.
I expect my variant to be slightly faster with big tables. Not sure about smaller tables. Would be great if you'd report back with test results.
Plus a minor fix: < instead of <=.

Using outer query result in a subquery in postgresql

Tags:

sql

postgresql

correlated-subquery

Tables:

brad

1 Answers

Edit

Erwin Brandstetter

Recent Activity

Donate For Us

Using outer query result in a subquery in postgresql

Tags:

sql

postgresql

correlated-subquery

Tables:

brad

1 Answers

Edit

Erwin Brandstetter

Related questions

Recent Activity

Donate For Us