Calculating running sum over partition in BigQuery

Question

I'm trying to calculate a running sum over a partition. This seems easier and quicker than the method suggested in BigQuery SQL running totals.

For example:

SELECT corpus,corpus_date,word_count, sum(word_count) over (partition by corpus,corpus_date order by word_count,word DESC) as running_sum FROM [publicdata:samples.shakespeare]

I'm facing 2 problems:

I'm unable to let the sum start with the most common word (word with highest word_count). Setting DESC or ASC just doesn't change anything, and the sum starts with the least common word(s). If I change the order by to include only "order by word_count" than the running sum isn't correct since rows with the same order (== same word_count) yield the same running sum.
In a similar query I'm executing (see below), the first row of the running sum yields a sum of 0, although the field I sum upon isn't 0 for the first row. Why does this happen? How can I workaround the problem to show the correct running sum? The query is:

select * from
(SELECT
mongo_id,
account_id,
event_date,
trx_amount_sum_per_day,
SUM (trx_amount_sum_per_day) OVER (PARTITION BY mongo_id,account_id ORDER BY event_date DESC) AS running_sum,
ROW_NUMBER() OVER (PARTITION BY mongo_id,account_id ORDER BY event_date DESC) AS row_num
FROM [xs-polar-gasket-4:publicdataset.publictable]
) order by event_date desc

Felipe Hoffa · Accepted Answer

For question 1:

Change:

SELECT
  corpus, corpus_date, word_count, SUM(word_count)
OVER
  (PARTITION BY corpus, corpus_date
  ORDER BY word_count, word DESC) AS running_sum
FROM [publicdata:samples.shakespeare]

To:

SELECT
  corpus, corpus_date, word_count, SUM(word_count)
OVER
  (PARTITION BY corpus, corpus_date
  ORDER BY word_count DESC, word) AS running_sum
FROM [publicdata:samples.shakespeare]

(Original query is sorting by word, but you wanted to sort by word_count)

Calculating running sum over partition in BigQuery

Tags:

google-bigquery

Lior

1 Answers

Felipe Hoffa

Recent Activity

Donate For Us

Calculating running sum over partition in BigQuery

Tags:

google-bigquery

Lior

1 Answers

Felipe Hoffa

Related questions

Recent Activity

Donate For Us