Given a table with columns such as:
Date, Type
I'm running the following SQL:
SELECT Type, count(*) as CountPerType
FROM myTable
WHERE Date between 20200101 and 20200131
GROUP BY count(*)
I want to have an extra column Percentage
which will have 100.0 * CountPerType / SUM(CountPerType)
. What's the most efficient way to do it in PrestoDB (which powers Amazon Athena)?
I would write the query without a subquery. You can mix window functions and aggregation functions:
SELECT Type, COUNT(*) as CountPerType,
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () as percentage
FROM t
WHERE Date BETWEEN 20200101 AND 20200131
GROUP BY Type;
I don't know if the performance is different from the version using the subquery (this should be at least as good). But the query is definitely simpler.
You can use window functionality to achieve that. You should always do a group by of non-aggregated fields.
select
Type,
CountPerType,
100.0 * CountPerType/sum(CountPerType) over () as columnName
from
(
SELECT
Type,
count(*) as CountPerType
FROM myTable
WHERE Date between 20200101 and 20200131
GROUP BY
Type
) subq
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With