I know this question has been asked in a different form a while ago. But now that BQ allows DML on partitioned table, its more important to understand when the streaming buffer is flushed so that we can perform DML on tables for maintenance.
This is very important now since
Now I have to update all the tables since we are performing some sort of hashing for GDPR.
If I cant run the DML, then I have to restate the 200 * 1500 partitions by joining with a reference table.
If I can run the DML then I just have to run 1500 udpate statements.
I have stopped the streaming and have been waiting since > 90 minutes and yet still get the same error that I cant run DML since the table has streaming buffer. Any response with your own experience would be highly appreciated.
Answer is "it depends" and mostly based on size of data you stream to buffer - but it also based on algorithmic tuning on BQ side. As of now - there is no definite time you can somehow calculate before data will flush. And there is no mechanism to invoke flush of buffer manually.
So apparently BigQuery now allows update on older partitions of partitioned tables with streaming buffer now. But not on the streaming buffer itself.
For example :
update
`dataset.table_name`
set column = 'value'
where _PARTITIONTIME = '2018-05-01'
Works beautifully.
But
update
`dataset.table_name`
set column = 'value'
where _PARTITIONTIME is null
Doesn't work and fails with the below error:
UPDATE or DELETE statement over table
dataset.table_name
would affect rows in the streaming buffer, which is not supported
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With