How to flatten a pandas dataframe with some columns as json?

Tags:

I have a dataframe df that loads data from a database. Most of the columns are json strings while some are even list of jsons. For example:

id     name     columnA                               columnB 1     John     {"dist": "600", "time": "0:12.10"}    [{"pos": "1st", "value": "500"},{"pos": "2nd", "value": "300"},{"pos": "3rd", "value": "200"}, {"pos": "total", "value": "1000"}] 2     Mike     {"dist": "600"}                       [{"pos": "1st", "value": "500"},{"pos": "2nd", "value": "300"},{"pos": "total", "value": "800"}] ...

As you can see, not all the rows have the same number of elements in the json strings for a column.

What I need to do is keep the normal columns like id and name as it is and flatten the json columns like so:

id    name   columnA.dist   columnA.time   columnB.pos.1st   columnB.pos.2nd   columnB.pos.3rd     columnB.pos.total 1     John   600            0:12.10        500               300               200                 1000  2     Mark   600            NaN            500               300               Nan                 800

I have tried using json_normalize like so:

from pandas.io.json import json_normalize json_normalize(df)

But there seems to be some problems with keyerror. What is the correct way of doing this?

798

asked Oct 06 '16 14:10

sfactor

2 Answers

Here's a solution using json_normalize() again by using a custom function to get the data in the correct format understood by json_normalize function.

import ast from pandas.io.json import json_normalize  def only_dict(d):     '''     Convert json string representation of dictionary to a python dict     '''     return ast.literal_eval(d)  def list_of_dicts(ld):     '''     Create a mapping of the tuples formed after      converting json strings of list to a python list        '''     return dict([(list(d.values())[1], list(d.values())[0]) for d in ast.literal_eval(ld)])  A = json_normalize(df['columnA'].apply(only_dict).tolist()).add_prefix('columnA.') B = json_normalize(df['columnB'].apply(list_of_dicts).tolist()).add_prefix('columnB.pos.')

Finally, join the DFs on the common index to get:

df[['id', 'name']].join([A, B])

EDIT:- As per the comment by @MartijnPieters, the recommended way of decoding the json strings would be to use json.loads() which is much faster when compared to using ast.literal_eval() if you know that the data source is JSON.

101

answered Sep 20 '22 13:09

Nickil Maveli

The quickest seems to be:

import pandas as pd import json  json_struct = json.loads(df.to_json(orient="records"))     df_flat = pd.io.json.json_normalize(json_struct) #use pd.io.json

answered Sep 18 '22 13:09

staonas

Related questions
                            
                                Selection with .loc in python
                            
                                Using fourier analysis for time series prediction
                            
                                How do you directly overlay a scatter plot on top of a jpg image in matplotlib / Python?
                            
                                How to create/customize your own scorer function in scikit-learn?
                            
                                How do you create a custom activation function with Keras?
                            
                                Python regex findall
                            
                                Save Naive Bayes Trained Classifier in NLTK
                            
                                scikit-learn random state in splitting dataset
                            
                                Quick way to extend a set if we know elements are unique
                            
                                pyodbc insert into sql
                            
                                PyYAML dump format
                            
                                How to set the root directory for Visual Studio Code Python Extension?
                            
                                How is `x = 42; x = lambda: x` parsed?
                            
                                Simple file server to serve current directory [closed]
                            
                                How can I implement incremental training for xgboost?
                            
                                Dynamic/runtime method creation (code generation) in Python
                            
                                Make distutils look for numpy header files in the correct place
                            
                                Python: 'break' outside loop
                            
                                Converting a deque object into list
                            
                                In TensorFlow is there any way to just initialize uninitialised variables?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to flatten a pandas dataframe with some columns as json?

Tags:

python

json

flatten

pandas

dataframe

sfactor

People also ask

2 Answers

Nickil Maveli

staonas

Recent Activity

Donate For Us