I'm using the following code to insert a Pandas dataframe with multiple NaN
values into a BigQuery table. The dataframe is prepared in cloud Datalab.
import google.datalab.bigquery as bq
bqtable = ('project_name', 'dataset_name', 'table_name')
table = bq.Table(bqtable)
table_schema = bq.Schema.from_data(df)
table.create(schema = table_schema, overwrite = True)
table.insert(df)
I'm getting the following error because of the NaN
values in the dataframe:
RequestException: HTTP request failed: Invalid JSON payload received.
Unexpected token. : "user_id": NaN,
^
I know that JSON
does not understand NaN
but I can't just use fillna
to convert those NaN
values to something else as I need to have those fields inserted as null
in the BigQuery table.
Does anyone have a workaround for this?
Use the pandas_gbq. to_gbq() function to write a pandas. DataFrame object to a BigQuery table. The destination table and destination dataset will automatically be created if they do not already exist.
NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float. NaN value is one of the major problems in Data Analysis.
Replace all np.nan
values with python's None
value, then re-run your code (or try df.to_gbq
):
df = df.where(pd.notnull(df), None)
I'm not experienced with Google BigQuery and I see nothing wrong with your existing code, but it may be worth installing the pandas-gbq
package. Then try to write the DataFrame to GBQ with df.to_gbq
, as detailed in the docs here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_gbq.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With