Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery: Computing aggregate over window of time for each person

Given a table in Google BigQuery:

User  Timestamp 
A     TIMESTAMP(12/05/2015 12:05:01.8023)
B     TIMESTAMP(9/29/2015 12:15:01.0323)
B     TIMESTAMP(9/29/2015 13:05:01.0233)
A     TIMESTAMP(9/29/2015 14:05:01.0432)
C     TIMESTAMP(8/15/2015 5:05:01.0000)
B     TIMESTAMP(9/29/2015 14:06:01.0233)
A     TIMESTAMP(9/29/2015 14:06:01.0432)

Is there a simple way to compute:

User  Maximum_Number_of_Events_this_User_Had_in_One_Hour  
A     2
B     3
C     1

where the time window of one hour is a parameter?

I tried doing this myself using a combination of LAG and partition functions by building off of these two questions:

BigQuery SQL for 28-day sliding window aggregate (without writing 28 lines of SQL)

Bigquery SQL for sliding window aggregate

But find those posts are too dissimilar since I am not finding number of people per time window, but instead finding max number of events per person within a time window.

like image 294
cgnorthcutt Avatar asked Sep 29 '15 09:09

cgnorthcutt


1 Answers

Here is an efficient succinct way to do it that exploits the ordered structure of timestamps.

SELECT
  user,
  MAX(per_hour) AS max_event_per_hour
FROM
(
  SELECT 
    user,
    COUNT(*) OVER (PARTITION BY user ORDER BY timestamp RANGE BETWEEN 60 * 60 * 1000000 PRECEDING AND CURRENT ROW) as per_hour,
    timestamp
  FROM 
    [dataset_example_in_question_user_timestamps]
)
GROUP BY user
like image 93
cgnorthcutt Avatar answered Sep 17 '22 22:09

cgnorthcutt