Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Insert record to BigQuery streaming vs simple insert - using table->insertRow or table->inserRows vs using bigquery->runquery

If I use table->insertRow() or table->inserRows() functions, that means I am streaming data into bigquery.

https://googlecloudplatform.github.io/google-cloud-php/#/docs/google-cloud/v0.53.0/bigquery/table

But if I use bigquery->runQuery() function to run the insert statements, that is not streaming.

https://googlecloudplatform.github.io/google-cloud-php/#/docs/google-cloud/v0.53.0/bigquery/bigqueryclient?method=runQuery

In what case I should using streaming to insert data? and in what case, I should use bigquery runQuery to run a insert statement?

Thanks!

like image 271
searain Avatar asked Feb 14 '18 22:02

searain


People also ask

What is streaming insert in BigQuery?

The user sends a streaming insert into BigQuery via the tabledata. insertAll method. This insert is sent to the API in JSON format, along with other details such as authorization headers and details about the intended destination. A single insertAll call may have one or more individual records within it.

Does BigQuery support streaming inserts?

Stay organized with collections Save and categorize content based on your preferences. Inserts simple rows into a table using the streaming API (insertAll).


1 Answers

Just as a complement to the answer that @GrahamPolley provided in the comments to your question (and for the sake of having an answer that is more accessible for the community), let me share some documentation links that may be useful for the questions that you made in this post.

Streaming Inserts vs. DML Insert

Streaming inserts use the tabledata.insertAll method to insert one record at a time in a table, and it does not require that a load job is run. In this documentation page, you can find all the information related to Streaming Inserts, its features, limitations, and quotas.

As stated in the comment to your question, using DML inserts you can only perform 1000 INSERT queries per table per day (and the same limit applies to load jobs too), while for streaming inserts, you don't have a limit per day, and only maximum rows per second and request.

Data availability

Data that is streamed into BigQuery is available shortly (within a few seconds) after the first streaming insertion in a table, even if it is in the buffer. However, this data may be unavailable for other operations (such as copy, export or even some API methods like tabledata.list), so if you are performing any of those, you will first have to check that the buffer is empty. You can do that by checking the response from tables.get in a section named streamingBuffer, which should be empty is the buffer is empty.

like image 74
dsesto Avatar answered Oct 05 '22 20:10

dsesto