I have a pandas DataFrame containing one column with multiple JSON data items as list of dicts. I want to normalize the JSON column and duplicate the non-JSON columns:
# creating dataframe
df_actions = pd.DataFrame(columns=['id', 'actions'])
rows = [[12,json.loads('[{"type": "a","value": "17"},{"type": "b","value": "19"}]')],
[15, json.loads('[{"type": "a","value": "1"},{"type": "b","value": "3"},{"type": "c","value": "5"}]')]]
df_actions.loc[0] = rows[0]
df_actions.loc[1] = rows[1]
>>>df_actions
id actions
0 12 [{'type': 'a', 'value': '17'}, {'type': 'b', '...
1 15 [{'type': 'a', 'value': '1'}, {'type': 'b', 'v...
I want
>>>df_actions_parsed
id type value
12 a 17
12 b 19
15 a 1
15 b 3
15 c 5
I can normalize JSON data using:
pd.concat([pd.DataFrame(json_normalize(x)) for x in df_actions['actions']],ignore_index=True)
but I don't know how to join that back to the id column of the original DataFrame.
To normalize all columns of pandas DataFrame, we simply subtract the mean and divide by standard deviation. This example gives unbiased estimates. Alternatively, you can also get the same using DataFrame. apply() and lambda .
A simple way to traverse datasets based on JSON API specification. Normalize is a lightweight javascript library with simple and powerful api. Has no dependencies and weighs less than 1KB.
You can use concat
with dict comprehension
with pop
for extract column, remove second level and join
to original:
df1 = (pd.concat({i: pd.DataFrame(x) for i, x in df_actions.pop('actions').items()})
.reset_index(level=1, drop=True)
.join(df_actions)
.reset_index(drop=True))
What is same as:
df1 = (pd.concat({i: json_normalize(x) for i, x in df_actions.pop('actions').items()})
.reset_index(level=1, drop=True)
.join(df_actions)
.reset_index(drop=True))
print (df1)
type value id
0 a 17 12
1 b 19 12
2 a 1 15
3 b 3 15
4 c 5 15
Here's another solution that uses explode
and json_normalize
:
exploded = df_actions.explode("actions")
pd.concat([exploded["id"].reset_index(drop=True), pd.json_normalize(exploded["actions"])], axis=1)
Here's the result:
id type value
0 12 a 17
1 12 b 19
2 15 a 1
3 15 b 3
4 15 c 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With