I've got two tables, both with timestamps and some more data:
| name | timestamp | a_data |
| ---- | ------------------- | ------ |
| 1 | 2018-01-01 11:10:00 | a |
| 2 | 2018-01-01 12:20:00 | b |
| 3 | 2018-01-01 13:30:00 | c |
| name | timestamp | b_data |
| ---- | ------------------- | ------ |
| 1 | 2018-01-01 11:00:00 | w |
| 2 | 2018-01-01 12:00:00 | x |
| 3 | 2018-01-01 13:00:00 | y |
| 3 | 2018-01-01 13:10:00 | y |
| 3 | 2018-01-01 13:10:00 | z |
What I want to do is
LEFT JOIN
the most recent record in Table B that predates it. | name | timestamp | a_data | b_data |
| ---- | ------------------- | ------ | ------ |
| 1 | 2018-01-01 11:10:00 | a | w |
| 2 | 2018-01-01 12:20:00 | b | x |
| 3 | 2018-01-01 13:30:00 | c | z | <-- note z, not y
I think this involves a subquery, but I cannot get this to work in Big Query. What I have so far:
SELECT a.a_data, b.b_data
FROM `table_a` AS a
LEFT JOIN `table_b` AS b
ON a.name = b.name
WHERE a.timestamp = (
SELECT max(timestamp) from `table_b` as sub
WHERE sub.name = b.name
AND sub.timestamp < a.timestamp
)
On my actual dataset, which is a very small test set (under 2Mb) the query runs but never completes. Any pointers much appreciated 👍🏻
You can try to use a select subquery.
SELECT a.*,(
SELECT MAX(b.b_data)
FROM `table_b` AS b
WHERE
a.name = b.name
and
b.timestamp < a.timestamp
) b_data
FROM `table_a` AS a
EDIT
Or you can try to use ROW_NUMBER
window function in a subquery.
SELECT name,timestamp,a_data , b_data
FROM (
SELECT a.*,b.b_data,ROW_NUMBER() OVER(PARTITION BY a.name ORDER BY b.timestamp desc,b.name desc) rn
FROM `table_a` AS a
LEFT JOIN `table_b` AS b ON a.name = b.name AND b.timestamp < a.timestamp
) t1
WHERE rn = 1
Below is for BigQuery Standard SQL and does not require specifying all columns on both sides - only name
and timestamp
. So it will work for any number of the columns in both tables (assuming no ambiguity in name rather than for above mentioned two columns)
#standardSQL
SELECT a.*, b.* EXCEPT (name, timestamp)
FROM (
SELECT
ANY_VALUE(a) a,
ARRAY_AGG(b ORDER BY b.timestamp DESC LIMIT 1)[SAFE_OFFSET(0)] b
FROM `project.dataset.table_a` a
LEFT JOIN `project.dataset.table_b` b
USING (name)
WHERE a.timestamp > b.timestamp
GROUP BY TO_JSON_STRING(a)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With