I'm trying to calculate a running sum over a partition. This seems easier and quicker than the method suggested in BigQuery SQL running totals.
For example:
SELECT corpus,corpus_date,word_count, sum(word_count) over (partition by corpus,corpus_date order by word_count,word DESC) as running_sum FROM [publicdata:samples.shakespeare]
I'm facing 2 problems:
I'm unable to let the sum start with the most common word (word with highest word_count). Setting DESC or ASC just doesn't change anything, and the sum starts with the least common word(s). If I change the order by to include only "order by word_count" than the running sum isn't correct since rows with the same order (== same word_count) yield the same running sum.
In a similar query I'm executing (see below), the first row of the running sum yields a sum of 0, although the field I sum upon isn't 0 for the first row. Why does this happen? How can I workaround the problem to show the correct running sum? The query is:
select * from
(SELECT
mongo_id,
account_id,
event_date,
trx_amount_sum_per_day,
SUM (trx_amount_sum_per_day) OVER (PARTITION BY mongo_id,account_id ORDER BY event_date DESC) AS running_sum,
ROW_NUMBER() OVER (PARTITION BY mongo_id,account_id ORDER BY event_date DESC) AS row_num
FROM [xs-polar-gasket-4:publicdataset.publictable]
) order by event_date desc
For question 1:
Change:
SELECT
corpus, corpus_date, word_count, SUM(word_count)
OVER
(PARTITION BY corpus, corpus_date
ORDER BY word_count, word DESC) AS running_sum
FROM [publicdata:samples.shakespeare]
To:
SELECT
corpus, corpus_date, word_count, SUM(word_count)
OVER
(PARTITION BY corpus, corpus_date
ORDER BY word_count DESC, word) AS running_sum
FROM [publicdata:samples.shakespeare]
(Original query is sorting by word, but you wanted to sort by word_count)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With