In contrast with the BigQuery documentation, we see that it DOES cache the results when selecting data from a streaming, data partitioned table (Standard SQL).
Example: When we perform a deterministic date scan on the streaming, data partitioned table using:
where (_PARTITIONTIME > '2017-11-12' or _PARTITIONTIME is null)
...BigQuery caches the data for 5 to 20 minutes if we fire the same exact query within that time frame.
While in my interpretation of the documentation it states that it SHOULD NOT cache the data:
'When any of the tables referenced by the query have recently received streaming inserts (a streaming buffer is attached to the table) even if no new rows have arrived'
Important notes:
Our Questions:
What is going on here / Why does the BQ caching happen at all?
The time this data stays in the BQ cache is 'random' (between 5-20 minutes). What does this mean?
Thanks for clarifying the question. I think it's an overlook that we didn't disabled caching for partitioned tables with streaming data. It should as otherwise the query might return outdated results.
We invalidate the cache when the table is changed. Streaming into the table will cause the table to be changed. I guess that's why the cache is invalidated between 5 to 20 minutes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With