i have actually a pandas dataframe and i want to save it to json format. From the pandas docs it says: <blockquote> Note NaN‘s, NaT‘s and None will be converted to null and datetime objects will be converted based on the date_format and date_unit parameters </blockquote> Then using the orient option <code>records</code> i have something like this <pre class="prettyprint"><code>[{"A":1,"B":4,"C":7},{"A":null,"B":5,"C":null},{"A":3,"B":null,"C":null}] </code></pre> Is it possible to have this instead: <pre class="prettyprint"><code>[{"A":1,"B":4,"C":7},{"B":5},{"A":3}]' </code></pre> Thank you

The following gets close to what you want, essentially we create a list of the non-NaN values and then call <code>to_json</code> on this: <pre class="prettyprint"><code>In [136]: df.apply(lambda x: [x.dropna()], axis=1).to_json() Out[136]: '{"0":[{"a":1.0,"b":4.0,"c":7.0}],"1":[{"b":5.0}],"2":[{"a":3.0}]}' </code></pre> creating a list is necessary here otherwise it will try to align the result with your original df shape and this will reintroduce the <code>NaN</code> values which is what you want to avoid: <pre class="prettyprint"><code>In [138]: df.apply(lambda x: pd.Series(x.dropna()), axis=1).to_json() Out[138]: '{"a":{"0":1.0,"1":null,"2":3.0},"b":{"0":4.0,"1":5.0,"2":null},"c":{"0":7.0,"1":null,"2":null}}' </code></pre> also calling <code>list</code> on the result of <code>dropna</code> will broadcast the result with the shape, like filling: <pre class="prettyprint"><code>In [137]: df.apply(lambda x: list(x.dropna()), axis=1).to_json() Out[137]: '{"a":{"0":1.0,"1":5.0,"2":3.0},"b":{"0":4.0,"1":5.0,"2":3.0},"c":{"0":7.0,"1":5.0,"2":3.0}}' </code></pre>

Pandas remove null values when to_json

Tags:

python

json

pandas

i have actually a pandas dataframe and i want to save it to json format. From the pandas docs it says:

Note NaN‘s, NaT‘s and None will be converted to null and datetime objects will be converted based on the date_format and date_unit parameters

Then using the orient option records i have something like this

[{"A":1,"B":4,"C":7},{"A":null,"B":5,"C":null},{"A":3,"B":null,"C":null}]

Is it possible to have this instead:

[{"A":1,"B":4,"C":7},{"B":5},{"A":3}]'

Thank you

698

asked Jun 18 '15 10:06

mva

2 Answers

The solution above doesn't actually produce results in the 'records' format. This solution also uses the json package, but produces exactly the result asked for in the original question.

import pandas as pd
import json

json.dumps([row.dropna().to_dict() for index,row in df.iterrows()])

Additionally, if you want to include the index (and you are on Python 3.5+) you can do:

json.dumps([{'index':index, **row.dropna().to_dict()} for index,row in df.iterrows()])

154

answered Oct 11 '22 02:10

Dave DeCaprio

The following gets close to what you want, essentially we create a list of the non-NaN values and then call to_json on this:

In [136]:
df.apply(lambda x: [x.dropna()], axis=1).to_json()

Out[136]:
'{"0":[{"a":1.0,"b":4.0,"c":7.0}],"1":[{"b":5.0}],"2":[{"a":3.0}]}'

creating a list is necessary here otherwise it will try to align the result with your original df shape and this will reintroduce the NaN values which is what you want to avoid:

In [138]:
df.apply(lambda x: pd.Series(x.dropna()), axis=1).to_json()

Out[138]:
'{"a":{"0":1.0,"1":null,"2":3.0},"b":{"0":4.0,"1":5.0,"2":null},"c":{"0":7.0,"1":null,"2":null}}'

also calling list on the result of dropna will broadcast the result with the shape, like filling:

In [137]:
df.apply(lambda x: list(x.dropna()), axis=1).to_json()

Out[137]:
'{"a":{"0":1.0,"1":5.0,"2":3.0},"b":{"0":4.0,"1":5.0,"2":3.0},"c":{"0":7.0,"1":5.0,"2":3.0}}'

answered Oct 11 '22 02:10

EdChum

Related questions
                            
                                How to bypass Incapsula with Python
                            
                                How can I create an local webserver for my python scripts?
                            
                                Python doesn't detect a closed socket until the second send
                            
                                Running python script with cron only if not running
                            
                                Fastest way in Python to find a 'startswith' substring in a long sorted list of strings
                            
                                Reproducibility of python pseudo-random numbers across systems and versions?
                            
                                Is there any way to pass 'stdin' as an argument to another process in python?
                            
                                python matplotlib imshow() custom tickmarks
                            
                                Django logging of custom management commands
                            
                                python2.7: logging configuration with yaml
                            
                                nltk tokenization and contractions
                            
                                Ideal Way to Create a Python "Library"
                            
                                ipython: re-importing modules when using %run
                            
                                PIL and vectorbased graphics
                            
                                How to write a Django view for a POST request
                            
                                Using distutils and build_clib to build C library
                            
                                Iterate over a very large number of files in a folder
                            
                                Python not catching MemoryError
                            
                                Is it possible to dynamically update a rendered template in Flask, server-side?
                            
                                How to get the number of requests in queue in scrapy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With