Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to join by closest timestamp in BigQuery?

Let's say I have a table of class start times and a table of students with their desired start times. I want to join the two tables by matching the closest Class.StartTime with Student.DesiredStartTime (see example below). How would you do that? I see the question frequently asked and answered, but only for other databases (not BigQuery). Since BigQuery has some unique attributes, I'm wondering if BigQuery has any special features to help this along? Thanks!

Class
+-----------------------------------+------------+
|               StartTime           |    Class   |
+-----------------------------------+------------+
| 07/01/19 08:00                    | English    |
| 07/01/19 09:00                    | Chemistry  |
| 07/01/19 10:30                    | Math       |
+-----------------------------------+------------+

Student
+-----------------------------------+------------+
|               DesiredStartTime    |    Student |
+-----------------------------------+------------+
| 07/01/19 08:45                    | Jimmy      |
| 07/01/19 09:15                    | Bobby      |
| 07/01/19 10:00                    | Buddy      |
+-----------------------------------+------------+

[Query Results]
+-----------------------------------+------------+------------+
|               StartTime           |    Class   |  Student   |
+-----------------------------------+------------+------------+
| 07/01/19 09:00                    | Chemistry  | Jimmy      |
| 07/01/19 09:00                    | Chemistry  | Bobby      |
| 07/01/19 10:30                    | Math       | Buddy      |
+-----------------------------------+------------+------------+
like image 568
Jim Ott Avatar asked Dec 05 '22 09:12

Jim Ott


1 Answers

Unlike a lot of other databases, this is a good time for a cross join in BQ. The following query finds the absolute difference (in minutes) between a student's desired start time and ALL class start times, ranks them, and then picks the one that is closest.

with joined as (
  select 
    Student, 
    Class,
    StartTime,
    DesiredStartTime, 
    ABS(TIMESTAMP_DIFF(StartTime,DesiredStartTime, MINUTE)) as abs_difference_mins
  from <dataset>.Class
  cross join <dataset>.Student
),
ranked as (
  select
    StartTime,
    Class,
    Student,
    row_number() over(partition by Student order by abs_difference_mins asc) as ranked_by_mins_diff
  from joined
)
select * except(ranked_by_mins_diff)
from ranked
where ranked_by_mins_diff = 1
like image 97
rtenha Avatar answered Dec 28 '22 05:12

rtenha