I have two Firebase projects (one iOS and one Android) feeding into Bigquery. I need to combine, flatten, and aggregate some specific data from both projects into one combined table so that I can report off of it without querying all bazillion rows across all daily tables.
In order to populate this aggregate table, I currently have two python scripts querying the iOS and Android intraday tables every 5 minutes. The script gets the max timestamp from the aggregate table, then queries the intraday table to get any records with a greater timestamp (I track the max timestamp separately for iOS and Android because they frequently differ).
I am querying the intraday table with this (abbreviated) wildcard syntax:
SELECT yadda, yadda, timestamp_micros, 'ios' as platform
FROM `myproject.iOSapp.app_events_intraday*`
WHERE timestamp_micros > (Select max(timestamp_micros)
from myAggregateTable WHERE platform = 'ios' )
Is there any danger that when the intraday table flips over to the new day, I will miss any records when my script runs at 23:57 and then again at 00:02?
I thought I would post the results of my testing this for a few months. Here are the basic mechanics as I see them:
DAY1
intraday table is created at midnight GMT (xyz.app_events_intraday_20180101
)DAY2
intraday table is created 24 hours later (xyz.app_events_intraday_20180102
), but DAY1
intraday table sticks around for a few hoursDAY1
table is "renamed" to xyz.app_events_20180101
and you are left with a single (current) intraday tableMy tests have shown that additional data is added to the app_events_*
tables, even after step 3 has taken place, so it is NOT safe to assume that the data is stable/static once the name has changed. I have new data appear up to 2 or 3 days later.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With