BigQuery: Load from CSV, skip columns

Question

Say I have a table with existing data, with a schema like:

{ 'name' : 'Field1', 'type' : 'STRING' },
{ 'name' : 'Field2', 'type' : 'STRING' }

Our data is CSV:

Field1,Field2
Value1,Value2
...

We load data by creating a new job, loading a CSV directly from Google Cloud Storage (GCS). Our data files now have an additional column and different ordering, such that the data is now structured:

Field1,Field3,Field2
Value1,Value3,Value2
...

Is there a way to specify in the load job that we would like to skip the second column, and only load columns 1 and 3 (named Field1 and Field2)?

I am using the Python API e.g., service.jobs().insert(job_body)

Basically I want to do something like this:

job_body = {
  'projectId': projectId,
  'configuration': {
      'load': {
        'sourceUris': [sourceCSV],
        'schema': {
          'fields': [
            {
              'name': 'Field1',
              'type': 'STRING'
            },
            { # this would be the skipped field
              'name': None
              'skip': True
            },
            {
              'name': 'Field2',
              'type': 'String'
            },
          ]
        },
        'destinationTable': {
          'projectId': projectId,
          'datasetId': datasetId,
          'tableId': targetTableId
        },
      }
    }
  }

Thanks!

Danny Kitt · Accepted Answer

Felipe's suggestion should work. Another possibility, if you're able to modify the CSV you're loading into BigQuery, would be the ignoreUnknownValues flag on load jobs:

[Optional] Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is false which treats unknown values as errors. For CSV this ignores extra values at the end of a line. For JSON this ignores named values that do not match any column name.

Using this flag would, however, require reordering the columns in your CSV or formatting your data as JSON.

Felipe Hoffa · Answer

It's not currently possible to do that, but it could be an interesting feature request. Feel free to add it to https://code.google.com/p/google-bigquery/issues/list.

In the meantime, I would do a 2 step import:

Import as a new table with 3 columns.
Append "SELECT column1, column2 FROM [newtable]" into the existing table.

BigQuery: Load from CSV, skip columns

Tags:

python

csv

google-bigquery

Kevin S.

2 Answers

Danny Kitt

Felipe Hoffa

Recent Activity

Donate For Us

BigQuery: Load from CSV, skip columns

Tags:

python

csv

google-bigquery

Kevin S.

2 Answers

Danny Kitt

Felipe Hoffa

Related questions

Recent Activity

Donate For Us