Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas expand json field across records

I have an interesting problem, and I'm wondering if there's a concise, pythonic (pandastic?) way to do this, rather than iterating over rows of a data frame.

Take a DataFrame with one field that is a json encoding of information:

    Name      Data
0   Joe       '[{"label":"a","value":"1"},{"label":"b","value":"2"}]'
1   Sue       '[{"label":"a","value":"3"},{"label":"c","value":"4"}]'
2   Bob       '[{"label":"b","value":"4"},{"label":"d","value":"1"}]'

I want to expand the json field to be data fields, unioning the different column headers, to get this:

    Name      Data                 a    b    c    d
0   Joe       '[{"label":"a"...    1    2    
1   Sue       '[{"label":"a"...    3         4
2   Bob       '[{"label":"b"...         4         1

The blanks are missing values. I know I can use read_json to create data frames from the json field, but then I want to re-flatten these data frames into extra columns of the original data set.

So, is there an elegant way to do this without iterating over the various rows of the data frame? Any help would be appreciated.

like image 766
David Pepper Avatar asked Aug 26 '14 17:08

David Pepper


People also ask

What is JSON normalization?

A simple way to traverse datasets based on JSON API specification. Normalize is a lightweight javascript library with simple and powerful api.

How do I flatten nested JSON?

Flatten a JSON object: var flatten = (function (isArray, wrapped) { return function (table) { return reduce("", {}, table); }; function reduce(path, accumulator, table) { if (isArray(table)) { var length = table.

What does PD Json_normalize do?

Pandas have a nice inbuilt function called json_normalize() to flatten the simple to moderately semi-structured nested JSON structures to flat tables.


1 Answers

Given

In [96]: df
Out[96]: 
  Name                   Data
0  Joe  [{"a":"1"},{"b":"2"}]
1  Sue  [{"a":"3"},{"c":"4"}]
2  Bob  [{"b":"4"},{"d":"1"}]

if you define

import json
def json_to_series(text):
    keys, values = zip(*[item for dct in json.loads(text) for item in dct.items()])
    return pd.Series(values, index=keys)

then

In [97]: result = pd.concat([df, df['Data'].apply(json_to_series)], axis=1)

In [98]: result
Out[98]: 
  Name                   Data    a    b    c    d
0  Joe  [{"a":"1"},{"b":"2"}]    1    2  NaN  NaN
1  Sue  [{"a":"3"},{"c":"4"}]    3  NaN    4  NaN
2  Bob  [{"b":"4"},{"d":"1"}]  NaN    4  NaN    1

Given

In [22]: df
Out[22]: 
  Name                                               Data
0  Joe  [{"label":"a","value":"1"},{"label":"b","value...
1  Sue  [{"label":"a","value":"3"},{"label":"c","value...
2  Bob  [{"label":"b","value":"4"},{"label":"d","value...

if you define

def json_to_series(text):
    keys, values = zip(*[(dct['label'], dct['value']) for dct in json.loads(text)])
    return pd.Series(values, index=keys)

then

In [20]: result = pd.concat([df, df['Data'].apply(json_to_series)], axis=1)

In [21]: result
Out[21]: 
  Name                                               Data    a    b    c    d
0  Joe  [{"label":"a","value":"1"},{"label":"b","value...    1    2  NaN  NaN
1  Sue  [{"label":"a","value":"3"},{"label":"c","value...    3  NaN    4  NaN
2  Bob  [{"label":"b","value":"4"},{"label":"d","value...  NaN    4  NaN    1

References:

like image 128
unutbu Avatar answered Sep 16 '22 23:09

unutbu