SQL aggregate function alias

Question

I'm a beginner at SQL and this is the question I have been asked to solve:

Say that a big city is defined as a place of type city with a population of at least 100,000. Write an SQL query that returns the scheme (state_name,no_big_city,big_city_population) ordered by state_name, listing those states which have either (a) at least five big cities or (b) at least one million people living in big cities. The column state_name is the name of the state, no_big_city is the number of big cities in the state, and big_city_population is the number of people living in big cities in the state.

Now, as far as I can see, the following query returns correct results:

SELECT state.name AS state_name
     , COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city
     , SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population
FROM state
JOIN place
ON state.code = place.state_code
GROUP BY state_name
    HAVING
        COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5 OR
        SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000
ORDER BY state_name;

However, the two aggregate functions used in the code appear twice. MY question: is there any way of making this code duplication disappear preserving functionality?

To be clear, I have already tried using the alias, but I just get a "column does not exist" error.

Erwin Brandstetter · Accepted Answer

The manual clarifies:

An output column's name can be used to refer to the column's value in ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses; there you must write out the expression instead.

Bold emphasis mine.

You can avoid typing long expressions repeatedly with a subquery or CTE:

SELECT state_name, no_big_city, big_city_population
FROM  (
   SELECT s.name AS state_name
        , COUNT(*)        FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS no_big_city
        , SUM(population) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS big_city_population
   FROM   state s
   JOIN   place p ON s.code = p.state_code
   GROUP  BY s.name -- can be input column name as well, best schema-qualified to avoid ambiguity
   ) sub
WHERE  no_big_city >= 5
   OR  big_city_population >= 1000000
ORDER  BY state_name;

While being at it, I simplified with the aggregate FILTER clause (Postgres 9.4+):

How can I simplify this game statistics query?

However, I suggest this simpler and faster query to begin with:

SELECT s.state_name, p.no_big_city, p.big_city_population
FROM   state s
JOIN  (
   SELECT state_code      AS code  -- alias just to simplify join
        , count(*)        AS no_big_city
        , sum(population) AS big_city_population
   FROM   place
   WHERE  type = 'city'
   AND    population >= 100000
   GROUP  BY 1  -- can be ordinal number referencing position in SELECT list
   HAVING count(*) >= 5 OR sum(population) >= 1000000  -- simple expressions now
   ) p USING (code)
ORDER  BY 1;    -- can also be ordinal number

I am demonstrating another option to reference expressions in GROUP BY and ORDER BY. Only use that if it doesn't impair readability and maintainability.

Aron · Answer

Not sure if this is a comment or an answer, since it is more preference based as opposed to technical, but I'll post it anyway

What I usually do when I need to reference calculated columns (usually a LOT at the same time) is I put my calculated columns within a derived table and then reference the calculated columns using its alias outside of the derived table. This syntax should be ANSI-SQL correct, but I am not familiar with PostGRES

select * from (

SELECT STATE.NAME AS state_name
    ,COUNT(CASE WHEN place.type = 'city'
                AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city
    ,SUM(CASE WHEN place.type = 'city'
                AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population
FROM STATE
INNER JOIN place
    ON STATE.code = place.state_code
    GROUP BY state_name
) sub 
    where no_big_city >= 5 
        and big_city_population >=100000

--HAVING COUNT(CASE WHEN place.type = 'city'
--          AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5
--  OR SUM(CASE WHEN place.type = 'city'
--              AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000
ORDER BY state_name;

The nice thing about this approach is, although you are adding complication via a subquery/derived table, the formula is kept in one place, so any changes only have to happen once. I do not know if this will perform worse than simply repeating the calcuation in the group-by, but I can't imagine it would be that much worse.

SQL aggregate function alias

Tags:

sql

postgresql

case

aggregate-functions

having

Paul Benn

2 Answers

Erwin Brandstetter

Aron

Recent Activity

Donate For Us

SQL aggregate function alias

Tags:

sql

postgresql

case

aggregate-functions

having

Paul Benn

2 Answers

Erwin Brandstetter

Aron

Related questions

Recent Activity

Donate For Us