Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Update or Delete tables with streaming buffer in BigQuery?

I'm getting this following error when trying to delete records from a table created through GCP Console and updated with GCP BigQuery Node.js table insert function.

UPDATE or DELETE DML statements are not supported over table stackdriver-360-150317:my_dataset.users with streaming buffer

The table was created without streaming features. And from what I'm reading in documentation Tables that have been written to recently via BigQuery Streaming (tabledata.insertall) cannot be modified using UPDATE or DELETE statements.

Does it mean that once a record has been inserted with this function into a table, there's no way to delete records? At all? If that's the case, does it mean that table needs to be deleted and recreated from scratch? If that's not the case. Can you please suggest a workaround to avoid this issue?

Thanks!


Including new error message for SEO: "UPDATE or DELETE statement over table ... would affect rows in the streaming buffer, which is not supported" -- Fh

like image 801
Diego Avatar asked Mar 29 '17 06:03

Diego


People also ask

What is streaming buffer in BigQuery?

Streaming buffer: The buffer that retains recently inserted rows, and is optimized for high-throughput writes rather than columnar access. Instant availability reader: Allows the query engine to read records directly from the streaming buffer. Columnar storage: Data associated with a table that's in columnar format.

Can we update table in BigQuery?

The BigQuery data manipulation language (DML) enables you to update, insert, and delete data from your BigQuery tables.

How do you refresh a table in BigQuery?

To update to the latest BigQuery data, at the bottom of the pivot table, click Refresh.

How do you clear a table in BigQuery?

You can delete a BigQuery table in the following ways: BigQuery Delete Table: Using the Cloud Console. BigQuery Delete Table: Using a Data Definition Language (DDL) DROP TABLE statement. BigQuery Delete Table: Using the bq command-line tool bq rm command.


2 Answers

To check if the table has a streaming buffer, check the tables.get response for a section named streamingBuffer or, when streaming to a partitioned table, data in the streaming buffer has a NULL value for the _PARTITIONTIME pseudo column, so even with a simple WHERE query can be checked.

Streamed data is available for real-time analysis within a few seconds of the first streaming insertion into a table but it can take up to 90 minutes to become available for copy/export and other operations. You probably have to wait up to 90 minutes so all buffer is persisted on the cluster. You can use queries to see if the streaming buffer is empty or not like you mentioned.

If you use load job to create the table, you won't have streaming buffer, but probably you streamed some values to it.


Note the answer below to work with tables that have ongoing streaming buffers. Just use a WHERE to filter out the latest minutes of data and your queries will work. -- Fh

like image 199
Pentium10 Avatar answered Sep 21 '22 03:09

Pentium10


Make sure to change your filters so they don't include data that could be in the current streaming buffer.

For example, this query fails while I'm streaming to this table:

DELETE FROM `project.dataset.table`  WHERE id LIKE '%-%'  Error: UPDATE or DELETE statement over table project.dataset.table would affect rows in the streaming buffer, which is not supported 

You can fix it by only deleting older records:

DELETE FROM `project.dataset.table`  WHERE id LIKE '%-%' AND ts < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 40 MINUTE)  4282 rows affected. 
like image 39
Felipe Hoffa Avatar answered Sep 19 '22 03:09

Felipe Hoffa