I am streaming data into a BigQuery table.
I had done this quite a few times before, it was working fine. But recently I started to see the above approach not working.
After streaming is done (no error reported), I query the table, sometimes it worked. Sometimes, I got empty table. (Same script, same data, run many times, the results are different. Sometimes works, sometime not.)
And to add to the mystery, when I streamed large amount data, it seemed working most of the times. But when I streamed small amount data, then it failed most of the times.
But if I just do
It always works.
I tried this both in Google Apps Scrip and PHP Google Cloud Client Library for BigQuery. I had the same problems.
So I tried this in Google Apps Script
It still gave me the same problems.
But there are no error reported or logged.
Additional Information:
I tried again.
If I wait until the stream buffer is empty, and then run the script. The results are always correct. The new data streamed into the new table successfully.
But if I run the script, right after previous running, then the results are empty. The data is not streamed into the new table.
So error seems happening when I "delete the old table and create the new table" when stream buffer is not empty.
But according to the answer from this thread, BigQuery Stream and Delete while streaming buffer is not empty?,
the old table and new table (even they are with the same name and same schema), they are with two different "object id". They are actually two different tables. After I delete the old table, the old records in stream buffer would be dropped too. Stream buffer is empty or not, it should not affect my next steps, create a new table and stream new data to the new table.
On the other hand, if I try to "truncate old table", instead of "delete old table and create a new table", while there might still be data in stream buffer, then "DML statement cannot modify data still in stream buffer", so "truncate old table" would fail.
In simple words, in this use case,
$dataset = $bigQuery->dataset($datasetId); $table = $dataset->table($tableId); $table->delete();
To append to or overwrite a table using query results, specify a destination table and set the write disposition to either: Append to table — Appends the query results to an existing table. Overwrite table — Overwrites an existing table with the same name using the query results.
In the Google Cloud console, go to the BigQuery page. In the Explorer pane, expand the project and dataset nodes of the table snapshot you want to restore from. Click the name of the table snapshot. In the table snapshot pane that appears, click Restore.
I posted in another thread of mine regarding streaming into BigQuery. Now as a rule, I am trying to avoid streaming if I can.
Which will solve many streaming related issues.
Avoid truncating and recreating tables while streaming.
From the official docs:
https://cloud.google.com/bigquery/troubleshooting-errors#streaming
Table Creation/Deletion - Streaming to a nonexistent table will return a variation of a notFound response. Creating the table in response may not immediately be recognized by subsequent streaming inserts. Similarly, deleting and/or recreating a table may create a period of time where streaming inserts are effectively delivered to the old table and will not be present in the newly created table.
Table Truncation - Truncating a table's data (e.g. via a query job that uses writeDisposition of WRITE_TRUNCATE) may similarly cause subsequent inserts during the consistency period to be dropped.
To avoid losing data: Create a new table with a different name.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With