Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery Could not parse 'null' as int for field

Tried to load csv files into bigquery table. There are columns where the types are INTEGER, but some missing values are NULL. So when I use the command bq load to load, got the following error:

Could not parse 'null' as int for field

So I am wondering what are the best solutions to deal with this, have to reprocess the data first for bq to load?

like image 594
Charles Zhan Avatar asked Jul 29 '17 01:07

Charles Zhan


3 Answers

You'll need to transform the data in order to end up with the expected schema and data. Instead of INTEGER, specify the column as having type STRING. Load the CSV file into a table that you don't plan to use long-term, e.g. YourTempTable. In the BigQuery UI, click "Show Options", then select a destination table with the table name that you want. Now run the query:

#standardSQL
SELECT * REPLACE(SAFE_CAST(x AS INT64) AS x)
FROM YourTempTable;

This will convert the string values to integers where 'null' is treated as null.

like image 121
Elliott Brossard Avatar answered Oct 03 '22 16:10

Elliott Brossard


Please try with job config setting.

job_config.null_marker = 'NULL'

configuration.load.nullMarker
string
[Optional] Specifies a string that represents a null value in a CSV file. For example, if you specify "\N", BigQuery interprets "\N" as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.

https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load

like image 37
Lefyer Li Avatar answered Oct 03 '22 17:10

Lefyer Li


BigQuery Console has it's limitations and doesn't allow you to specify a null marker while loading data from a CSV. However, it can easily be done by using the BigQuery command-line tool's bq load command. We can use the --null_marker flag to specify the marker which is simply null in this case.

bq load --source_format=CSV \
    --null_marker=null \
    --skip_leading_rows=1 \
    dataset.table_name \
    ./data.csv \
    ./schema.json

Setting the null_marker as null does the trick here. You can omit the schema.json part if the table is already present with a valid schema. --skip_leading_rows=1 is used because my first row was a header. You can learn more about the bg load command in the BigQuery Documentation.

The load command however lets you create and load a table in a single go. The schema needs to be specified in a JSON file in the below format:

[
 {
   "description": "[DESCRIPTION]",
   "name": "[NAME]",
   "type": "[TYPE]",
   "mode": "[MODE]"
 },
 {
   "description": "[DESCRIPTION]",
   "name": "[NAME]",
   "type": "[TYPE]",
   "mode": "[MODE]"
 }
]
like image 37
Abhijay Kumar Avatar answered Oct 03 '22 18:10

Abhijay Kumar