Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SUM For Distinct Rows

Given the following table structures:

countries: id, name
regions: id, country_id, name, population
cities: id, region_id, name

...and this query...

SELECT c.name AS country, COUNT(DISTINCT r.id) AS regions, COUNT(s.id) AS cities
FROM countries AS c
JOIN regions AS r ON r.country_id = c.id
JOIN cities AS s ON s.region_id = r.id
GROUP BY c.id

How would I add a SUM of the regions.population value to calculate the country's population? I need to only use the value of each region once when summing, but the un-grouped result has multiple rows for each region (the number of cities in that region).

Example data:

mysql> SELECT * FROM countries;
+----+-----------+
| id | name      |
+----+-----------+
|  1 | country 1 |
|  2 | country 2 |
+----+-----------+
2 rows in set (0.00 sec)

mysql> SELECT * FROM regions;
+----+------------+-----------------------+------------+
| id | country_id | name                  | population |
+----+------------+-----------------------+------------+
| 11 |          1 | region 1 in country 1 |         10 |
| 12 |          1 | region 2 in country 1 |         15 |
| 21 |          2 | region 1 in country 2 |         25 |
+----+------------+-----------------------+------------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM cities;
+-----+-----------+---------------------------------+
| id  | region_id | name                            |
+-----+-----------+---------------------------------+
| 111 |        11 | City 1 in region 1 in country 1 |
| 112 |        11 | City 2 in region 1 in country 1 |
| 121 |        12 | City 1 in region 2 in country 1 |
| 211 |        21 | City 1 in region 1 in country 2 |
+-----+-----------+---------------------------------+
4 rows in set (0.00 sec)

Desired output with example data:

+-----------+---------+--------+------------+
| country   | regions | cities | population |
+-----------+---------+--------+------------+
| country 1 |       2 |      3 |         25 |
| country 2 |       1 |      1 |         25 |
+-----------+---------+--------+------------+

I prefer a solution that doesn't require changing the JOIN logic.

The accepted solution for this post seems to be in the neighborhood of what I'm looking for, but I haven't been able to figure out how to apply it to my issue.


MY SOLUTION

SELECT c.id AS country_id,
    c.name AS country,
    COUNT(x.region_id) AS regions,
    SUM(x.population) AS population,
    SUM(x.cities) AS cities
FROM countries AS c
LEFT JOIN (
        SELECT r.country_id,
            r.id AS region_id,
            r.population AS population,
            COUNT(s.id) AS cities
        FROM regions AS r
        LEFT JOIN cities AS s ON s.region_id = r.id
        GROUP BY r.country_id, r.id, r.population
    ) AS x ON x.country_id = c.id
GROUP BY c.id, c.name

Note: My actual query is much more complex and has nothing to do with countries, regions, or cities. This is a minimal example to illustrate my issue.

like image 498
Sonny Avatar asked Dec 19 '14 16:12

Sonny


People also ask

How do you sum distinct rows in SQL?

The SQL Server SUM() function is an aggregate function that calculates the sum of all or distinct values in an expression. In this syntax: ALL instructs the SUM() function to return the sum of all values including duplicates. ALL is used by default.

How do you find total distinct records?

To count the number of different values that are stored in a given column, you simply need to designate the column you pass in to the COUNT function as DISTINCT . When given a column, COUNT returns the number of values in that column. Combining this with DISTINCT returns only the number of unique (and non-NULL) values.

Can we use aggregate function with distinct?

You can use DISTINCT to eliminate duplicate values in aggregate function calculations.

Can you use count with distinct?

Yes, you can use COUNT() and DISTINCT together to display the count of only distinct rows. SELECT COUNT(DISTINCT yourColumnName) AS anyVariableName FROM yourTableName; To understand the above syntax, let us create a table.


1 Answers

First of all, the other post you reference is not the same situation. In that case, the joins are like [A -> B and A -> C], so the weighted average (which is what that calculation does) is correct. In your case the joins are like [A -> B -> C], so you need a different approach.

The simplest solution that comes to mind right away does involve a subquery, but not a complex one:

SELECT 
    c.name AS country, 
    COUNT(r.id) AS regions, 
    SUM(s.city_count) AS cities,
    SUM(r.population) as population
FROM countries AS c
JOIN regions AS r ON r.country_id = c.id
JOIN 
    (select region_id, count(*) as city_count
    from cities 
    group by region_id) AS s
ON s.region_id = r.id
GROUP BY c.id

The reason this works is that it resolves the cities to one row per region before joining to the region, thus eliminating the cross join situation.

like image 131
radshop Avatar answered Sep 24 '22 20:09

radshop