I have an interesting problem, and I'm wondering if there's a concise, pythonic (pandastic?) way to do this, rather than iterating over rows of a data frame.
Take a DataFrame with one field that is a json encoding of information:
Name Data
0 Joe '[{"label":"a","value":"1"},{"label":"b","value":"2"}]'
1 Sue '[{"label":"a","value":"3"},{"label":"c","value":"4"}]'
2 Bob '[{"label":"b","value":"4"},{"label":"d","value":"1"}]'
I want to expand the json field to be data fields, unioning the different column headers, to get this:
Name Data a b c d
0 Joe '[{"label":"a"... 1 2
1 Sue '[{"label":"a"... 3 4
2 Bob '[{"label":"b"... 4 1
The blanks are missing values. I know I can use read_json to create data frames from the json field, but then I want to re-flatten these data frames into extra columns of the original data set.
So, is there an elegant way to do this without iterating over the various rows of the data frame? Any help would be appreciated.
A simple way to traverse datasets based on JSON API specification. Normalize is a lightweight javascript library with simple and powerful api.
Flatten a JSON object: var flatten = (function (isArray, wrapped) { return function (table) { return reduce("", {}, table); }; function reduce(path, accumulator, table) { if (isArray(table)) { var length = table.
Pandas have a nice inbuilt function called json_normalize() to flatten the simple to moderately semi-structured nested JSON structures to flat tables.
Given
In [96]: df
Out[96]:
Name Data
0 Joe [{"a":"1"},{"b":"2"}]
1 Sue [{"a":"3"},{"c":"4"}]
2 Bob [{"b":"4"},{"d":"1"}]
if you define
import json
def json_to_series(text):
keys, values = zip(*[item for dct in json.loads(text) for item in dct.items()])
return pd.Series(values, index=keys)
then
In [97]: result = pd.concat([df, df['Data'].apply(json_to_series)], axis=1)
In [98]: result
Out[98]:
Name Data a b c d
0 Joe [{"a":"1"},{"b":"2"}] 1 2 NaN NaN
1 Sue [{"a":"3"},{"c":"4"}] 3 NaN 4 NaN
2 Bob [{"b":"4"},{"d":"1"}] NaN 4 NaN 1
Given
In [22]: df
Out[22]:
Name Data
0 Joe [{"label":"a","value":"1"},{"label":"b","value...
1 Sue [{"label":"a","value":"3"},{"label":"c","value...
2 Bob [{"label":"b","value":"4"},{"label":"d","value...
if you define
def json_to_series(text):
keys, values = zip(*[(dct['label'], dct['value']) for dct in json.loads(text)])
return pd.Series(values, index=keys)
then
In [20]: result = pd.concat([df, df['Data'].apply(json_to_series)], axis=1)
In [21]: result
Out[21]:
Name Data a b c d
0 Joe [{"label":"a","value":"1"},{"label":"b","value... 1 2 NaN NaN
1 Sue [{"label":"a","value":"3"},{"label":"c","value... 3 NaN 4 NaN
2 Bob [{"label":"b","value":"4"},{"label":"d","value... NaN 4 NaN 1
References:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With