I have a pandas DataFrame containing one column with multiple JSON data items as list of dicts. I want to normalize the JSON column and duplicate the non-JSON columns: <pre class="prettyprint"><code># creating dataframe df_actions = pd.DataFrame(columns=['id', 'actions']) rows = [[12,json.loads('[{"type": "a","value": "17"},{"type": "b","value": "19"}]')], [15, json.loads('[{"type": "a","value": "1"},{"type": "b","value": "3"},{"type": "c","value": "5"}]')]] df_actions.loc[0] = rows[0] df_actions.loc[1] = rows[1] >>>df_actions id actions 0 12 [{'type': 'a', 'value': '17'}, {'type': 'b', '... 1 15 [{'type': 'a', 'value': '1'}, {'type': 'b', 'v... </code></pre> I want <pre class="prettyprint"><code>>>>df_actions_parsed id type value 12 a 17 12 b 19 15 a 1 15 b 3 15 c 5 </code></pre> I can normalize JSON data using: <pre class="prettyprint"><code>pd.concat([pd.DataFrame(json_normalize(x)) for x in df_actions['actions']],ignore_index=True) </code></pre> but I don't know how to join that back to the id column of the original DataFrame.

You can use <code>concat</code> with <code>dict comprehension</code> with <code>pop</code> for extract column, remove second level and <code>join</code> to original: <pre class="prettyprint"><code>df1 = (pd.concat({i: pd.DataFrame(x) for i, x in df_actions.pop('actions').items()}) .reset_index(level=1, drop=True) .join(df_actions) .reset_index(drop=True)) </code></pre> What is same as: <pre class="prettyprint"><code>df1 = (pd.concat({i: json_normalize(x) for i, x in df_actions.pop('actions').items()}) .reset_index(level=1, drop=True) .join(df_actions) .reset_index(drop=True)) </code></pre> <hr> <pre class="prettyprint"><code>print (df1) type value id 0 a 17 12 1 b 19 12 2 a 1 15 3 b 3 15 4 c 5 15 </code></pre>

pandas DataFrame: normalize one JSON column and merge with other columns

Tags:

python

json

pandas

dataframe

I have a pandas DataFrame containing one column with multiple JSON data items as list of dicts. I want to normalize the JSON column and duplicate the non-JSON columns:

# creating dataframe
df_actions = pd.DataFrame(columns=['id', 'actions'])
rows = [[12,json.loads('[{"type": "a","value": "17"},{"type": "b","value": "19"}]')],
   [15, json.loads('[{"type": "a","value": "1"},{"type": "b","value": "3"},{"type": "c","value": "5"}]')]]
df_actions.loc[0] = rows[0]
df_actions.loc[1] = rows[1]

>>>df_actions
   id                                            actions
0  12  [{'type': 'a', 'value': '17'}, {'type': 'b', '...
1  15  [{'type': 'a', 'value': '1'}, {'type': 'b', 'v...

I want

>>>df_actions_parsed
   id      type    value
   12      a        17
   12      b        19
   15      a        1
   15      b        3
   15      c        5

I can normalize JSON data using:

pd.concat([pd.DataFrame(json_normalize(x)) for x in df_actions['actions']],ignore_index=True)

but I don't know how to join that back to the id column of the original DataFrame.

887

asked Apr 05 '18 11:04

stack_lech

2 Answers

You can use concat with dict comprehension with pop for extract column, remove second level and join to original:

df1 = (pd.concat({i: pd.DataFrame(x) for i, x in df_actions.pop('actions').items()})
         .reset_index(level=1, drop=True)
         .join(df_actions)
         .reset_index(drop=True))

What is same as:

df1 = (pd.concat({i: json_normalize(x) for i, x in df_actions.pop('actions').items()})
         .reset_index(level=1, drop=True)
         .join(df_actions)
         .reset_index(drop=True))

print (df1)
  type value  id
0    a    17  12
1    b    19  12
2    a     1  15
3    b     3  15
4    c     5  15

151

answered Oct 07 '22 14:10

jezrael

Here's another solution that uses explode and json_normalize:

exploded = df_actions.explode("actions")
pd.concat([exploded["id"].reset_index(drop=True), pd.json_normalize(exploded["actions"])], axis=1)

Here's the result:

   id type value
0  12    a    17
1  12    b    19
2  15    a     1
3  15    b     3
4  15    c     5

answered Oct 07 '22 13:10

Powers

Related questions
                            
                                Using modules imported from another import
                            
                                What ordering does dict.keys() and dict.values() guarantee? [duplicate]
                            
                                b'Recode from ANSI 1252 to UTF-8 failed with the error: "Invalid argument".' geopandas python
                            
                                How to get the column name when iterating through dataframe pandas?
                            
                                Merging two dictionaries into one dataframe
                            
                                Pandas: combining header rows of a multiIndex DataFrame
                            
                                How to get all users in a telegram channel using telethon?
                            
                                How to use Dataset API to read TFRecords file of lists of variant length?
                            
                                Why do tuples in a list comprehension need parentheses? [duplicate]
                            
                                If ElseIf Else condition in pandas dataframe list comprehension
                            
                                How to scrape data from a website when linked to event clicks?
                            
                                How to drop duplicates from a subset of rows in a pandas dataframe?
                            
                                Is there a way to cycle through indexes [duplicate]
                            
                                How can I apply a function to itself?
                            
                                How to import python files in google colaboratory?
                            
                                No module named 'beautifulsoup4' in python3
                            
                                Why doesn't Python have a "continue if" statement?
                            
                                Predict probabilities using SVM
                            
                                weird behavior when importing os.path
                            
                                How to pass arguments to Tornado's WebSocketHandler class?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With