I have a question around the WRITE_TRUNCATE behaviour in Big Query. I have a big query table (T1) which I'm periodically appending to with log data (one row per log line). I want to have a dataflow job (D1) that reads from this table, removes any duplicate rows and performs other data cleansing operations and then outputs this to another big query table (T2), replacing any data that may have already been present in this table. I believe I can do this by using the WRITE_TRUNCATE write disposition in the BigQuery.IO sink within the dataflow job. Question is, if I have another dataflow job (D2) reading from table T2 while job D1 is in the middle of a write truncate to this table, what data does D2 see, i.e. does it see the table in either the state it was in before the truncate or after the truncate has finished. Or can it see the table during any step during the truncate (e.g. part way through appending the new data)? The javadoc linked above suggests that the truncate may not be atomic while the REST documentation for Big Query suggests that it is.

The REST API is actually the source of truth here, i.e. the change is atomic upon the BigQuery job's successful completion.

WRITE_TRUNCATE behaviour in Big Query

Tags:

google-cloud-dataflow

I have a question around the WRITE_TRUNCATE behaviour in Big Query.

I have a big query table (T1) which I'm periodically appending to with log data (one row per log line). I want to have a dataflow job (D1) that reads from this table, removes any duplicate rows and performs other data cleansing operations and then outputs this to another big query table (T2), replacing any data that may have already been present in this table. I believe I can do this by using the WRITE_TRUNCATE write disposition in the BigQuery.IO sink within the dataflow job.

Question is, if I have another dataflow job (D2) reading from table T2 while job D1 is in the middle of a write truncate to this table, what data does D2 see, i.e. does it see the table in either the state it was in before the truncate or after the truncate has finished. Or can it see the table during any step during the truncate (e.g. part way through appending the new data)?

The javadoc linked above suggests that the truncate may not be atomic while the REST documentation for Big Query suggests that it is.

219

asked Sep 13 '17 00:09

hbakkum

1 Answers

The REST API is actually the source of truth here, i.e. the change is atomic upon the BigQuery job's successful completion.

answered Oct 13 '22 00:10

Michael Moursalimov

Related questions
                            
                                Redefinition of module 'Firebase' in Swift 3
                            
                                Google Cloud Pubsub Data lost
                            
                                Perform action after Dataflow pipeline has processed all data
                            
                                Is there a way to set the target for a task dynamically with the App Engine Java runtime?
                            
                                BigQuery - select top N posts from a large table for each subreddit
                            
                                Stream Error in the HTTP/2 framing layer: bigrquery commands error in R studio but not in Base R
                            
                                How do I receive notification if a Google Compute Engine instance restarts or migrates on maintenance?
                            
                                How to Insert new data in existing array in Firebase Database from Android?
                            
                                Python: How to update a value in Google BigQuery in less than 40 seconds?
                            
                                How to create a Push Notification (FCM) using C#
                            
                                Unable to connect to HTTP service running on Google Compute Engine VM instance
                            
                                Is it possible to start firebase serve with --inspect-brk as we did in node?
                            
                                Not able to create the file on Google Cloud Storage
                            
                                Running Freeswitch on Google Container Engine
                            
                                Firebase functions: logging with winston in stackdriver console
                            
                                Uncaught RangeError: Maximum call stack size exceeded ONLY on Production
                            
                                How to define firebase-remote-config parameters based on app version
                            
                                Google Cloud SQL proxy couldn't find default credentials
                            
                                Create an empty child record in Firebase
                            
                                Upload files to Firebase Storage using Node.js

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With