Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google BigQuery Streaming failed sometimes if I do delete table and create table first, before streaming

I am streaming data into a BigQuery table.

  • Delete the old table
  • Create a new table with same name and same schema
  • Stream data into new table

I had done this quite a few times before, it was working fine. But recently I started to see the above approach not working.

After streaming is done (no error reported), I query the table, sometimes it worked. Sometimes, I got empty table. (Same script, same data, run many times, the results are different. Sometimes works, sometime not.)

And to add to the mystery, when I streamed large amount data, it seemed working most of the times. But when I streamed small amount data, then it failed most of the times.

But if I just do

  • Create a new table
  • Stream data into the new table

It always works.

I tried this both in Google Apps Scrip and PHP Google Cloud Client Library for BigQuery. I had the same problems.

So I tried this in Google Apps Script

  • Delete the old table
  • Sleep 10 seconds, so the delete job should be done
  • Create a new table with same name and same schema
  • Sleep 10 seconds, so the create job should be done
  • Stream data into new table

It still gave me the same problems.

But there are no error reported or logged.

Additional Information:

I tried again.

If I wait until the stream buffer is empty, and then run the script. The results are always correct. The new data streamed into the new table successfully.

But if I run the script, right after previous running, then the results are empty. The data is not streamed into the new table.

So error seems happening when I "delete the old table and create the new table" when stream buffer is not empty.

But according to the answer from this thread, BigQuery Stream and Delete while streaming buffer is not empty?,

the old table and new table (even they are with the same name and same schema), they are with two different "object id". They are actually two different tables. After I delete the old table, the old records in stream buffer would be dropped too. Stream buffer is empty or not, it should not affect my next steps, create a new table and stream new data to the new table.

On the other hand, if I try to "truncate old table", instead of "delete old table and create a new table", while there might still be data in stream buffer, then "DML statement cannot modify data still in stream buffer", so "truncate old table" would fail.

In simple words, in this use case,

  • I cannot truncate the old table, because the steam buffer may not be empty.
  • I am supposed to "delete old table and create new table, then stream data to new table". But it seems it is the root of my current problems, my new data cannot be streamed to new table (even the new table is with a new object id, and it should not be affected by the fact I just delete an old table)
like image 607
searain Avatar asked Aug 23 '18 23:08

searain


People also ask

How do I delete a table in BigQuery?

$dataset = $bigQuery->dataset($datasetId); $table = $dataset->table($tableId); $table->delete();

How do you overwrite a table in BigQuery?

To append to or overwrite a table using query results, specify a destination table and set the write disposition to either: Append to table — Appends the query results to an existing table. Overwrite table — Overwrites an existing table with the same name using the query results.

How do I recover a deleted table in BigQuery?

In the Google Cloud console, go to the BigQuery page. In the Explorer pane, expand the project and dataset nodes of the table snapshot you want to restore from. Click the name of the table snapshot. In the table snapshot pane that appears, click Restore.


2 Answers

I posted in another thread of mine regarding streaming into BigQuery. Now as a rule, I am trying to avoid streaming if I can.

  • Load the data to Cloud Storage
  • Then load data from Cloud Storage to BigQuery

Which will solve many streaming related issues.

like image 28
searain Avatar answered Oct 27 '22 22:10

searain


Avoid truncating and recreating tables while streaming.

From the official docs:

https://cloud.google.com/bigquery/troubleshooting-errors#streaming

Table Creation/Deletion - Streaming to a nonexistent table will return a variation of a notFound response. Creating the table in response may not immediately be recognized by subsequent streaming inserts. Similarly, deleting and/or recreating a table may create a period of time where streaming inserts are effectively delivered to the old table and will not be present in the newly created table.

Table Truncation - Truncating a table's data (e.g. via a query job that uses writeDisposition of WRITE_TRUNCATE) may similarly cause subsequent inserts during the consistency period to be dropped.

To avoid losing data: Create a new table with a different name.

like image 155
Felipe Hoffa Avatar answered Oct 27 '22 22:10

Felipe Hoffa