Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas remove null values when to_json

i have actually a pandas dataframe and i want to save it to json format. From the pandas docs it says:

Note NaN‘s, NaT‘s and None will be converted to null and datetime objects will be converted based on the date_format and date_unit parameters

Then using the orient option records i have something like this

[{"A":1,"B":4,"C":7},{"A":null,"B":5,"C":null},{"A":3,"B":null,"C":null}]

Is it possible to have this instead:

[{"A":1,"B":4,"C":7},{"B":5},{"A":3}]'

Thank you

like image 698
mva Avatar asked Jun 18 '15 10:06

mva


People also ask

What is Isnull () SUM () pandas?

isnull is an alias for DataFrame. isna. Detect missing values. Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.

Is null and Notnull in pandas?

Python | Pandas isnull() and notnull() While making a Data Frame from a csv file, many blank columns are imported as null value into the Data Frame which later creates problems while operating that data frame. Pandas isnull() and notnull() methods are used to check and manage NULL values in a data frame.


2 Answers

The solution above doesn't actually produce results in the 'records' format. This solution also uses the json package, but produces exactly the result asked for in the original question.

import pandas as pd
import json

json.dumps([row.dropna().to_dict() for index,row in df.iterrows()])

Additionally, if you want to include the index (and you are on Python 3.5+) you can do:

json.dumps([{'index':index, **row.dropna().to_dict()} for index,row in df.iterrows()])
like image 154
Dave DeCaprio Avatar answered Oct 11 '22 02:10

Dave DeCaprio


The following gets close to what you want, essentially we create a list of the non-NaN values and then call to_json on this:

In [136]:
df.apply(lambda x: [x.dropna()], axis=1).to_json()

Out[136]:
'{"0":[{"a":1.0,"b":4.0,"c":7.0}],"1":[{"b":5.0}],"2":[{"a":3.0}]}'

creating a list is necessary here otherwise it will try to align the result with your original df shape and this will reintroduce the NaN values which is what you want to avoid:

In [138]:
df.apply(lambda x: pd.Series(x.dropna()), axis=1).to_json()

Out[138]:
'{"a":{"0":1.0,"1":null,"2":3.0},"b":{"0":4.0,"1":5.0,"2":null},"c":{"0":7.0,"1":null,"2":null}}'

also calling list on the result of dropna will broadcast the result with the shape, like filling:

In [137]:
df.apply(lambda x: list(x.dropna()), axis=1).to_json()

Out[137]:
'{"a":{"0":1.0,"1":5.0,"2":3.0},"b":{"0":4.0,"1":5.0,"2":3.0},"c":{"0":7.0,"1":5.0,"2":3.0}}'
like image 5
EdChum Avatar answered Oct 11 '22 02:10

EdChum