Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cannot insert new value to BigQuery table after updating with new column using streaming API

I'm seeing some strange behaviour with my bigquery table, I've just created added a new column to a table, it looks good on the interface and getting the schema via the api.

But when adding a value to the new column I get the following error:

{
  "insertErrors" : [ {
    "errors" : [ {
      "message" : "no such field",
      "reason" : "invalid"
    } ],
    "index" : 0
  } ],
  "kind" : "bigquery#tableDataInsertAllResponse"
}

I'm using the java client and streaming API, the only thing I added is:

tableRow.set("server_timestamp", 0)

Without that line it works correctly :(

Do you see anything wrong with it (the name of the column is server_timestamp, and it is defined as an INTEGER)

like image 731
iamedu Avatar asked Aug 13 '14 06:08

iamedu


2 Answers

Updating this answer since BigQuery's streaming system has seen significant updates since Aug 2014 when this question was originally answered.


BigQuery's streaming system caches the table schema for up to 2 minutes. When you add a field to the schema and then immediately stream new rows to the table, you may encounter this error.

The best way to avoid this error is to delay streaming rows with the new field for 2 minutes after modifying your table.

If that's not possible, you have a few other options:

  1. Use the ignoreUnknownValues option. This flag will tell the insert operation to ignore unknown fields, and accept only those fields that it recognizes. Setting this flag allows you to start streaming records with the new field immediately while avoiding the "no such field" error during the 2 minute window--but note that the new field values will be silently dropped until the cached table schema updates!

  2. Use the skipInvalidRows option. This flag will tell the insert operation to insert as many rows as it can, instead of failing the entire operation when a single invalid row is detected. This option is useful if only some of your data contains the new field, since you can continue inserting rows with the old format, and decide separately how to handle the failed rows (either with ignoreUnknownValues or by waiting for the 2 minute window to pass).

If you must capture all values and cannot wait for 2 minutes, you can create a new table with the updated schema and stream to that table. The downside to this approach is that you need to manage multiple tables generated by this approach. Note that you can query these tables conveniently using TABLE_QUERY, and you can run periodic cleanup queries (or table copies) to consolidate your data into a single table.

Historical note: A previous version of this answer suggested that users stop streaming, move the existing data to another table, re-create the streaming table, and restart streaming. However, due to the complexity of this approach and the shortened window for the schema cache, this approach is no longer recommended by the BigQuery team.

like image 98
shollyman Avatar answered Nov 05 '22 05:11

shollyman


I was running into this error. It turned out that I was building the insert object like i was in "raw" mode but had forgotten to set the flag raw: true. This caused bigQuery to take my insert data and nest it again under a json: {} node.

In otherwords, I was doing this:

table.insert({
    insertId: 123,
    json: {
        col1: '1',
        col2: '2',
    }
});

when I should have been doing this:

table.insert({
    insertId: 123,
    json: {
        col1: '1',
        col2: '2',
    }
}, {raw: true});

the node bigquery library didn't realize that it was already in raw mode and was then trying to insert this:

{
    insertId: '<generated value>',
    json: {
        insertId: 123,
        json: {
            col1: '1',
            col2: '2',
     }
}

So in my case the errors were referring to the fact that the insert was expecting my schema to have 2 columns in it (insertId and json).

like image 27
Christopher Fitzner Avatar answered Nov 05 '22 03:11

Christopher Fitzner