I have a dataframe in pandas and my goal is to write each row of the dataframe as a new json file.
I'm a bit stuck right now. My intuition was to iterate over the rows of the dataframe (using df.iterrows) and use json.dumps to dump the file but to no avail.
Any thoughts?
To convert the object to a JSON string, then use the Pandas DataFrame. to_json() function. Pandas to_json() is an inbuilt DataFrame function that converts the object to a JSON string. To export pandas DataFrame to a JSON file, then use the to_json() function.
You can convert pandas DataFrame to JSON string by using DataFrame. to_json() method. This method takes a very important param orient which accepts values ' columns ', ' records ', ' index ', ' split ', ' table ', and ' values '.
You can convert JSON to Pandas DataFrame by simply using read_json() . Just pass JSON string to the function. It takes multiple parameters, for our case I am using orient that specifies the format of JSON string. This function is also used to read JSON files into pandas DataFrame.
If 'orient' is 'records' write out line delimited json format. Will throw ValueError if incorrect 'orient' since others are not list like. bool.
Looping over indices is very inefficient.
A faster technique:
df['json'] = df.apply(lambda x: x.to_json(), axis=1)
Pandas DataFrames have a to_json method that will do it for you: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html
If you want each row in its own file you can iterate over the index (and use the index to help name them):
for i in df.index:
df.loc[i].to_json("row{}.json".format(i))
Extending the answer of @MrE, if you're looking to convert multiple columns from a single row into another column with the content in json format (and not separate json files as output) I've had speed issues while using:
df['json'] = df.apply(lambda x: x.to_json(), axis=1)
I've achieved significant speed improvements on a dataset of 175K records and 5 columns using this line of code:
df['json'] = df.to_json(orient='records', lines=True).splitlines()
Speed went from >1 min to 350 ms.
Using apply, this can be done as
def writejson(row):
with open(row["filename"]+'.json', "w") as outfile:
json.dump(row["json"], outfile, indent=2)
in_df.apply(writejson, axis=1)
Assuming the dataframe has a column named "filename" with filename for each json row.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With