Handling NaN values while inserting Pandas dataframes into BigQuery tables

Tags:

I'm using the following code to insert a Pandas dataframe with multiple NaN values into a BigQuery table. The dataframe is prepared in cloud Datalab.

import google.datalab.bigquery as bq

bqtable = ('project_name', 'dataset_name', 'table_name')
table = bq.Table(bqtable)

table_schema = bq.Schema.from_data(df)
table.create(schema = table_schema, overwrite = True)

table.insert(df)

I'm getting the following error because of the NaN values in the dataframe:

RequestException: HTTP request failed: Invalid JSON payload received. 
Unexpected token. : "user_id": NaN,
                               ^

I know that JSON does not understand NaN but I can't just use fillna to convert those NaN values to something else as I need to have those fields inserted as null in the BigQuery table. Does anyone have a workaround for this?

681

asked Oct 23 '18 19:10

Soroush Sotoudeh

1 Answers

Replace all np.nan values with python's None value, then re-run your code (or try df.to_gbq):

df = df.where(pd.notnull(df), None)

I'm not experienced with Google BigQuery and I see nothing wrong with your existing code, but it may be worth installing the pandas-gbq package. Then try to write the DataFrame to GBQ with df.to_gbq, as detailed in the docs here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_gbq.html

128

answered Sep 30 '22 20:09

Peter Leimbigler

Related questions
                            
                                Works with urrlib.request but doesn't work with requests
                            
                                Get Instagram followers list with python script
                            
                                How to topological sort a sub/nested graph?
                            
                                why networkx.draw() produces nothing? [duplicate]
                            
                                Where should virtualenvs go in production?
                            
                                Why there's the difference between creating class in python 2.7 and python 3.4 performance
                            
                                Subclassing file by subclassing `io.TextIOWrapper` — but what signature does its constructor have?
                            
                                Prevent access to an instance variable from subclass, without affecting base class
                            
                                Pympler summary doesn't seem to make sense
                            
                                Python module import works for one file, fails for another
                            
                                Redshift + SQLAlchemy long query hangs
                            
                                Python: How to generate all combinations of lists of tuples without repeating contents of the tuple
                            
                                os.path.abspath vs os.path.dirname
                            
                                How do I distribute my pip package with data files correctly?
                            
                                Get length of a dataset in Tensorflow
                            
                                How to convert all layers of a pretrained Keras model to a different dtype (from float32 to float16)?
                            
                                Can you use loc to select a range of columns plus a column outside of the range?
                            
                                Can't go on to the next page using post request
                            
                                ClobberError while installing virtual environment for conda
                            
                                Gunicorn to disable timeout

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Handling NaN values while inserting Pandas dataframes into BigQuery tables

Tags:

python-3.x

pandas

dataframe

google-bigquery

google-cloud-datalab

Soroush Sotoudeh

People also ask

1 Answers

Peter Leimbigler

Recent Activity

Donate For Us