Given the following table structures:
countries: id, name
regions: id, country_id, name, population
cities: id, region_id, name
...and this query...
SELECT c.name AS country, COUNT(DISTINCT r.id) AS regions, COUNT(s.id) AS cities
FROM countries AS c
JOIN regions AS r ON r.country_id = c.id
JOIN cities AS s ON s.region_id = r.id
GROUP BY c.id
How would I add a SUM
of the regions.population
value to calculate the country's population? I need to only use the value of each region once when summing, but the un-grouped result has multiple rows for each region (the number of cities in that region).
Example data:
mysql> SELECT * FROM countries;
+----+-----------+
| id | name |
+----+-----------+
| 1 | country 1 |
| 2 | country 2 |
+----+-----------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM regions;
+----+------------+-----------------------+------------+
| id | country_id | name | population |
+----+------------+-----------------------+------------+
| 11 | 1 | region 1 in country 1 | 10 |
| 12 | 1 | region 2 in country 1 | 15 |
| 21 | 2 | region 1 in country 2 | 25 |
+----+------------+-----------------------+------------+
3 rows in set (0.00 sec)
mysql> SELECT * FROM cities;
+-----+-----------+---------------------------------+
| id | region_id | name |
+-----+-----------+---------------------------------+
| 111 | 11 | City 1 in region 1 in country 1 |
| 112 | 11 | City 2 in region 1 in country 1 |
| 121 | 12 | City 1 in region 2 in country 1 |
| 211 | 21 | City 1 in region 1 in country 2 |
+-----+-----------+---------------------------------+
4 rows in set (0.00 sec)
Desired output with example data:
+-----------+---------+--------+------------+
| country | regions | cities | population |
+-----------+---------+--------+------------+
| country 1 | 2 | 3 | 25 |
| country 2 | 1 | 1 | 25 |
+-----------+---------+--------+------------+
I prefer a solution that doesn't require changing the JOIN
logic.
The accepted solution for this post seems to be in the neighborhood of what I'm looking for, but I haven't been able to figure out how to apply it to my issue.
MY SOLUTION
SELECT c.id AS country_id,
c.name AS country,
COUNT(x.region_id) AS regions,
SUM(x.population) AS population,
SUM(x.cities) AS cities
FROM countries AS c
LEFT JOIN (
SELECT r.country_id,
r.id AS region_id,
r.population AS population,
COUNT(s.id) AS cities
FROM regions AS r
LEFT JOIN cities AS s ON s.region_id = r.id
GROUP BY r.country_id, r.id, r.population
) AS x ON x.country_id = c.id
GROUP BY c.id, c.name
Note: My actual query is much more complex and has nothing to do with countries, regions, or cities. This is a minimal example to illustrate my issue.
The SQL Server SUM() function is an aggregate function that calculates the sum of all or distinct values in an expression. In this syntax: ALL instructs the SUM() function to return the sum of all values including duplicates. ALL is used by default.
To count the number of different values that are stored in a given column, you simply need to designate the column you pass in to the COUNT function as DISTINCT . When given a column, COUNT returns the number of values in that column. Combining this with DISTINCT returns only the number of unique (and non-NULL) values.
You can use DISTINCT to eliminate duplicate values in aggregate function calculations.
Yes, you can use COUNT() and DISTINCT together to display the count of only distinct rows. SELECT COUNT(DISTINCT yourColumnName) AS anyVariableName FROM yourTableName; To understand the above syntax, let us create a table.
First of all, the other post you reference is not the same situation. In that case, the joins are like [A -> B and A -> C], so the weighted average (which is what that calculation does) is correct. In your case the joins are like [A -> B -> C], so you need a different approach.
The simplest solution that comes to mind right away does involve a subquery, but not a complex one:
SELECT
c.name AS country,
COUNT(r.id) AS regions,
SUM(s.city_count) AS cities,
SUM(r.population) as population
FROM countries AS c
JOIN regions AS r ON r.country_id = c.id
JOIN
(select region_id, count(*) as city_count
from cities
group by region_id) AS s
ON s.region_id = r.id
GROUP BY c.id
The reason this works is that it resolves the cities to one row per region before joining to the region, thus eliminating the cross join situation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With