Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas DataFrame: normalize one JSON column and merge with other columns

I have a pandas DataFrame containing one column with multiple JSON data items as list of dicts. I want to normalize the JSON column and duplicate the non-JSON columns:

# creating dataframe
df_actions = pd.DataFrame(columns=['id', 'actions'])
rows = [[12,json.loads('[{"type": "a","value": "17"},{"type": "b","value": "19"}]')],
   [15, json.loads('[{"type": "a","value": "1"},{"type": "b","value": "3"},{"type": "c","value": "5"}]')]]
df_actions.loc[0] = rows[0]
df_actions.loc[1] = rows[1]

>>>df_actions
   id                                            actions
0  12  [{'type': 'a', 'value': '17'}, {'type': 'b', '...
1  15  [{'type': 'a', 'value': '1'}, {'type': 'b', 'v...

I want

>>>df_actions_parsed
   id      type    value
   12      a        17
   12      b        19
   15      a        1
   15      b        3
   15      c        5

I can normalize JSON data using:

pd.concat([pd.DataFrame(json_normalize(x)) for x in df_actions['actions']],ignore_index=True)

but I don't know how to join that back to the id column of the original DataFrame.

like image 887
stack_lech Avatar asked Apr 05 '18 11:04

stack_lech


People also ask

How do I normalize multiple columns in Pandas?

To normalize all columns of pandas DataFrame, we simply subtract the mean and divide by standard deviation. This example gives unbiased estimates. Alternatively, you can also get the same using DataFrame. apply() and lambda .

What is JSON normalization?

A simple way to traverse datasets based on JSON API specification. Normalize is a lightweight javascript library with simple and powerful api. Has no dependencies and weighs less than 1KB.


2 Answers

You can use concat with dict comprehension with pop for extract column, remove second level and join to original:

df1 = (pd.concat({i: pd.DataFrame(x) for i, x in df_actions.pop('actions').items()})
         .reset_index(level=1, drop=True)
         .join(df_actions)
         .reset_index(drop=True))

What is same as:

df1 = (pd.concat({i: json_normalize(x) for i, x in df_actions.pop('actions').items()})
         .reset_index(level=1, drop=True)
         .join(df_actions)
         .reset_index(drop=True))

print (df1)
  type value  id
0    a    17  12
1    b    19  12
2    a     1  15
3    b     3  15
4    c     5  15
like image 151
jezrael Avatar answered Oct 07 '22 14:10

jezrael


Here's another solution that uses explode and json_normalize:

exploded = df_actions.explode("actions")
pd.concat([exploded["id"].reset_index(drop=True), pd.json_normalize(exploded["actions"])], axis=1)

Here's the result:

   id type value
0  12    a    17
1  12    b    19
2  15    a     1
3  15    b     3
4  15    c     5
like image 43
Powers Avatar answered Oct 07 '22 13:10

Powers