Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Handling NaN values while inserting Pandas dataframes into BigQuery tables

I'm using the following code to insert a Pandas dataframe with multiple NaN values into a BigQuery table. The dataframe is prepared in cloud Datalab.

import google.datalab.bigquery as bq

bqtable = ('project_name', 'dataset_name', 'table_name')
table = bq.Table(bqtable)

table_schema = bq.Schema.from_data(df)
table.create(schema = table_schema, overwrite = True)


I'm getting the following error because of the NaN values in the dataframe:

RequestException: HTTP request failed: Invalid JSON payload received. 
Unexpected token. : "user_id": NaN,

I know that JSON does not understand NaN but I can't just use fillna to convert those NaN values to something else as I need to have those fields inserted as null in the BigQuery table. Does anyone have a workaround for this?

like image 681
Soroush Sotoudeh Avatar asked Oct 23 '18 19:10

Soroush Sotoudeh

People also ask

How do I write pandas DataFrame to BigQuery?

Use the pandas_gbq. to_gbq() function to write a pandas. DataFrame object to a BigQuery table. The destination table and destination dataset will automatically be created if they do not already exist.

What is NaN in pandas DF?

NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float. NaN value is one of the major problems in Data Analysis.

1 Answers

Replace all np.nan values with python's None value, then re-run your code (or try df.to_gbq):

df = df.where(pd.notnull(df), None)

I'm not experienced with Google BigQuery and I see nothing wrong with your existing code, but it may be worth installing the pandas-gbq package. Then try to write the DataFrame to GBQ with df.to_gbq, as detailed in the docs here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_gbq.html

like image 128
Peter Leimbigler Avatar answered Sep 30 '22 20:09

Peter Leimbigler