I am doing a GROUP BY and COUNT(*) on a dataset, and I would like to calculate the percentage of each group over the total.
For example, in this query, I would like to know how much the count() for each state represents over the total ( select count() from publicdata:samples.natality ):
SELECT state, count(*)
FROM [publicdata:samples.natality]
GROUP by state
There are several ways to do it in SQL, but I haven't found a way to do it in Bigquery, does anyone know?
Thanks!
Finding Percentages between two columns is straightforward. You can simply use the column names and the division operator “/” to divide values in one column by another. The result is a list of values that correspond to the result of the division of all the values in the two columns.
To calculate percent, we need to divide the counts by the count sums for each sample, and then multiply by 100. This can also be done using the function decostand from the vegan package with method = "total" .
To find the percentage of missing values in each column of an R data frame, we can use colMeans function with is.na function. This will find the mean of missing values in each column. After that we can multiply the output with 100 to get the percentage.
Check ratio_to_report, one of the recently announced window functions:
SELECT state, ratio * 100 AS percent FROM (
SELECT state, count(*) AS total, RATIO_TO_REPORT(total) OVER() AS ratio
FROM [publicdata:samples.natality]
GROUP by state
)
state percent
AL 1.4201828131159113
AK 0.23521048665998198
AZ 1.3332896746620975
AR 0.7709591206172346
CA 10.008298605982642
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With