Pandas expand json field across records

Tags:

I have an interesting problem, and I'm wondering if there's a concise, pythonic (pandastic?) way to do this, rather than iterating over rows of a data frame.

Take a DataFrame with one field that is a json encoding of information:

    Name      Data
0   Joe       '[{"label":"a","value":"1"},{"label":"b","value":"2"}]'
1   Sue       '[{"label":"a","value":"3"},{"label":"c","value":"4"}]'
2   Bob       '[{"label":"b","value":"4"},{"label":"d","value":"1"}]'

I want to expand the json field to be data fields, unioning the different column headers, to get this:

    Name      Data                 a    b    c    d
0   Joe       '[{"label":"a"...    1    2    
1   Sue       '[{"label":"a"...    3         4
2   Bob       '[{"label":"b"...         4         1

The blanks are missing values. I know I can use read_json to create data frames from the json field, but then I want to re-flatten these data frames into extra columns of the original data set.

So, is there an elegant way to do this without iterating over the various rows of the data frame? Any help would be appreciated.

766

asked Aug 26 '14 17:08

David Pepper

1 Answers

Given

In [96]: df
Out[96]: 
  Name                   Data
0  Joe  [{"a":"1"},{"b":"2"}]
1  Sue  [{"a":"3"},{"c":"4"}]
2  Bob  [{"b":"4"},{"d":"1"}]

if you define

import json
def json_to_series(text):
    keys, values = zip(*[item for dct in json.loads(text) for item in dct.items()])
    return pd.Series(values, index=keys)

then

In [97]: result = pd.concat([df, df['Data'].apply(json_to_series)], axis=1)

In [98]: result
Out[98]: 
  Name                   Data    a    b    c    d
0  Joe  [{"a":"1"},{"b":"2"}]    1    2  NaN  NaN
1  Sue  [{"a":"3"},{"c":"4"}]    3  NaN    4  NaN
2  Bob  [{"b":"4"},{"d":"1"}]  NaN    4  NaN    1

Given

In [22]: df
Out[22]: 
  Name                                               Data
0  Joe  [{"label":"a","value":"1"},{"label":"b","value...
1  Sue  [{"label":"a","value":"3"},{"label":"c","value...
2  Bob  [{"label":"b","value":"4"},{"label":"d","value...

if you define

def json_to_series(text):
    keys, values = zip(*[(dct['label'], dct['value']) for dct in json.loads(text)])
    return pd.Series(values, index=keys)

then

In [20]: result = pd.concat([df, df['Data'].apply(json_to_series)], axis=1)

In [21]: result
Out[21]: 
  Name                                               Data    a    b    c    d
0  Joe  [{"label":"a","value":"1"},{"label":"b","value...    1    2  NaN  NaN
1  Sue  [{"label":"a","value":"3"},{"label":"c","value...    3  NaN    4  NaN
2  Bob  [{"label":"b","value":"4"},{"label":"d","value...  NaN    4  NaN    1

References:

list comprehensions
the * unpacking operator
zip

128

answered Sep 16 '22 23:09

unutbu

Related questions
                            
                                How do I strtotime in python?
                            
                                ipython tab completion for custom dict class
                            
                                Unexpected keyword argument when using **kwargs in constructor
                            
                                Define pyqt4 signals with a list as argument
                            
                                List nearby/discoverable bluetooth devices, including already paired, in Python, on Linux
                            
                                Retrieving contents from a directory on a network drive (windows)
                            
                                How do I import data with different types from file into a Python Numpy array?
                            
                                Pandas Convert 'NA' to NaN
                            
                                How to exclude specific fields on serialization with jsonpickle?
                            
                                How to use mock_open() with patch.object() in test annotation
                            
                                Https with Http in Flask Python
                            
                                Row-wise indexing in Numpy
                            
                                Unit test: How to assert multiple calls of same method?
                            
                                How to select a class of div inside of a div with beautiful soup?
                            
                                How to use environment variables in supervisord commands
                            
                                Why do 'and' & 'or' return operands in Python?
                            
                                Why cant unittest.TestCases see my py.test fixtures?
                            
                                DLL load failed with scipy.optimize?
                            
                                How to get Windows short file name in python?
                            
                                Django queryset filter GT, LT, GTE, LTE returns full object list

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas expand json field across records

Tags:

python

json

pandas

David Pepper

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us