Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas row to json

I have a dataframe in pandas and my goal is to write each row of the dataframe as a new json file.

I'm a bit stuck right now. My intuition was to iterate over the rows of the dataframe (using df.iterrows) and use json.dumps to dump the file but to no avail.

Any thoughts?

like image 210
Roger Josh Avatar asked Mar 17 '16 03:03

Roger Josh


People also ask

How do I export Pandas DataFrame to JSON?

To convert the object to a JSON string, then use the Pandas DataFrame. to_json() function. Pandas to_json() is an inbuilt DataFrame function that converts the object to a JSON string. To export pandas DataFrame to a JSON file, then use the to_json() function.

Can we convert DataFrame to JSON in Python?

You can convert pandas DataFrame to JSON string by using DataFrame. to_json() method. This method takes a very important param orient which accepts values ' columns ', ' records ', ' index ', ' split ', ' table ', and ' values '.

How do I use JSON in Pandas?

You can convert JSON to Pandas DataFrame by simply using read_json() . Just pass JSON string to the function. It takes multiple parameters, for our case I am using orient that specifies the format of JSON string. This function is also used to read JSON files into pandas DataFrame.

What is Orient in JSON?

If 'orient' is 'records' write out line delimited json format. Will throw ValueError if incorrect 'orient' since others are not list like. bool.


4 Answers

Looping over indices is very inefficient.

A faster technique:

df['json'] = df.apply(lambda x: x.to_json(), axis=1)

like image 89
MrE Avatar answered Oct 11 '22 09:10

MrE


Pandas DataFrames have a to_json method that will do it for you: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html

If you want each row in its own file you can iterate over the index (and use the index to help name them):

for i in df.index:
    df.loc[i].to_json("row{}.json".format(i))
like image 33
tvashtar Avatar answered Oct 11 '22 08:10

tvashtar


Extending the answer of @MrE, if you're looking to convert multiple columns from a single row into another column with the content in json format (and not separate json files as output) I've had speed issues while using:

df['json'] = df.apply(lambda x: x.to_json(), axis=1)

I've achieved significant speed improvements on a dataset of 175K records and 5 columns using this line of code:

df['json'] = df.to_json(orient='records', lines=True).splitlines()

Speed went from >1 min to 350 ms.

like image 15
BramV Avatar answered Oct 11 '22 09:10

BramV


Using apply, this can be done as

def writejson(row):
  with open(row["filename"]+'.json', "w") as outfile:
    json.dump(row["json"], outfile, indent=2)

in_df.apply(writejson, axis=1)

Assuming the dataframe has a column named "filename" with filename for each json row.

like image 1
Steni Thomas Avatar answered Oct 11 '22 10:10

Steni Thomas