Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Percent to total in PostgreSQL without subquery

Tags:

postgresql

I have a table with users. Each user has a country. What I want is to get the list of all countries with the numbers of users and the percent/total. What I have so far is:

SELECT
country_id,
COUNT(*) AS total,
((COUNT(*) * 100) / (SELECT COUNT(*) FROM users WHERE cond1 = true AND cond2 = true AND cond3 = true)::decimal) AS percent
FROM users
WHERE cond1 = true AND cond2 = true AND cond3 = true
GROUP BY contry_id

Conditions in both of queries are the same. I tried to do this without a subquery but then I can't get the total number of users but total per country. Is there a way to do this without a subquery? I'm using PostgreSQL. Any help is highly appreciated. Thanks in advance

like image 304
fanjabi Avatar asked Jun 27 '11 07:06

fanjabi


People also ask

How do I get the percentage of total in PostgreSQL?

Calculating the “percentage of the total” for each row with Postgres can be done with a window function: SELECT *, (value / SUM(value) OVER ()) AS "% of total" FROM transactions WHERE quarter = '2015-03-31' and company_id = 1; We're using “OVER ()”, which means the sum over all rows returned by the where clause.

How to get value in percentage in SQL?

There is no built-in operator that calculates percentages in SQL Server. You have to rely on basic arithmetic operations i.e. (number1/number2 x 100) to find percentages in SQL Server.

Can you do calculations in PostgreSQL?

Mathematical operators are provided for many PostgreSQL types. For types without common mathematical conventions for all possible permutations (e.g., date/time types) we describe the actual behavior in subsequent sections. Table 9-2 shows the available mathematical operators.

How to calculate percentage in MySQL?

To calculate percentage of column in MySQL, you can simply cross join the sum() of sale column with the original table. If you want to add a where clause to filter your data, you need to place it after the CROSS JOIN, as shown below. Otherwise, you will get an error.


2 Answers

This is really old, but both of the select examples above either don't work, or are overly complex.

SELECT
    country_id,
    COUNT(*),
    (COUNT(*) / (SUM(COUNT(*)) OVER() )) * 100
FROM
    users
WHERE
    cond1 = true AND cond2 = true AND cond3 = true
GROUP BY 
    country_id

The second count is not necessary, it's just for debugging to ensure you're getting the right results. The trick is the SUM on top of the COUNT over the recordset.

Hope this helps someone.

Also, if anyone wants to do this in Django, just hack up an aggregate:

class PercentageOverRecordCount(Aggregate):
    function = 'OVER'
    template = '(COUNT(*) / (SUM(COUNT(*)) OVER() )) * 100'

    def __init__(self, expression, **extra):
        super().__init__(
            expression,
            output_field=DecimalField(),
            **extra
        )

Now it can be used in annotate.

like image 136
Trent Avatar answered Oct 19 '22 16:10

Trent


I guess the reason you want to eliminate the subquery is to avoid scanning the users table twice. Remember the total is the sum of the counts for each country.

WITH c AS (
  SELECT
    country_id,
    count(*) AS cnt
  FROM users
  WHERE cond1=...
  GROUP BY country_id
) 
SELECT
  *,
  100.0 * cnt / (SELECT sum(cnt) FROM c) AS percent
FROM c;

This query builds a small CTE with the per-country statistics. It will only scan the users table once, and generate a small result set (only one row per country).

The total (SELECT sum(cnt) FROM c) is calculated only once on this small result set, so it uses negligible time.

You could also use a window function :

SELECT
  country_id,
  cnt,
  100.0 * cnt / (sum(cnt) OVER ()) AS percent 
FROM (
  SELECT country_id, count(*) as cnt from users group by country_id
) foo;

(which is the same as nightwolf's query with the errors removed lol )

Both queries take about the same time.

like image 21
bobflux Avatar answered Oct 19 '22 15:10

bobflux