Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - normalize Json list

I am trying to normalize a column from a Pandas dataframe that is a list of dictionaries (can be missing).

Example to reproduce

import pandas as pd
bids = pd.Series([[{'price': 606, 'quantity': 28},{'price': 588, 'quantity': 29},
                   {'price': 513, 'quantity': 33}],[],[{'price': 7143, 'quantity': 15},
                    {'price': 68, 'quantity': 91},{'price': 6849, 'quantity': 12}]])
data = pd.DataFrame([1,2,3]).rename(columns={0:'id'})
data['bids'] = bids

Desired output

id price quantity
1  606    28
1  588    29
1  513    33
3  7143   15
3  68     91
3  6849   12

Attempt

Trying to resolve using pandas json_normalize, following docs here. I'm confused by why none of the below work, and what type of record_path will fix my problem. All the below error.

pd.json_normalize(data['bids'])
pd.json_normalize(data['bids'],['price','quantity'])
pd.json_normalize(data['bids'],[['price','quantity']])
like image 275
oli5679 Avatar asked Aug 21 '20 16:08

oli5679


People also ask

How do I flatten JSON in Pandas?

Pandas have a nice inbuilt function called json_normalize() to flatten the simple to moderately semi-structured nested JSON structures to flat tables. Parameters: data – dict or list of dicts. errors – {'raise', 'ignore'}, default 'raise'

What does PD json_normalize do?

json_normalize. Normalize semi-structured JSON data into a flat table. Unserialized JSON objects.

Which Pandas version has json_normalize?

json_normalize — pandas 0.22.

What is normalizing JSON?

A simple way to traverse datasets based on JSON API specification. Normalize is a lightweight javascript library with simple and powerful api.


2 Answers

Use DataFrame.explode on column bids then create a new dataframe from the dictionaries in exploded bids column and use DataFrame.join to join it with df:

df = data.explode('bids').dropna(subset=['bids']).reset_index(drop=True)
df = df.join(pd.DataFrame(df.pop('bids').tolist()))

Result:

print(df)
   id  price  quantity
0   1    606        28
1   1    588        29
2   1    513        33
3   3   7143        15
4   3     68        91
5   3   6849        12
like image 129
Shubham Sharma Avatar answered Oct 20 '22 06:10

Shubham Sharma


Adding another approach with np.repeat and np.concatenate with json_normalize

out = pd.io.json.json_normalize(np.concatenate(data['bids']))
out.insert(0,'id',np.repeat(data['id'],data['bids'].str.len()).to_numpy())

Or you can also use np.hstack as @Shubham mentions instead of np.concatenate:

out = pd.io.json.json_normalize(np.hstack(data['bids']))

print(out)

   id  price  quantity
0   1    606        28
1   1    588        29
2   1    513        33
3   3   7143        15
4   3     68        91
5   3   6849        12
like image 29
anky Avatar answered Oct 20 '22 05:10

anky