Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In pandas combine outer json with nested json and create new dataframe

Tags:

json_str = '[
{
"name": "t1",
"props": [
  {
    "abc": 10012,
    "def": "OBJECT"
  },
  {
    "abc": 999123,
    "def": "SUBJECT"
  }
],
"id": 1,
"title": "king"
},
{
"name": "t2",
"props": [
  {
    "abc": 789456,
    "def": "PRODUCT"
  }
],
"id": 2,
"title": "queen"
}
]'

Using above JSON, I want to create one dataframe that expands the props list and concats to main json columns.

In the end end I want to end up with these columns in df:

id,title,name,abc,def

With rows:

1,king,t1,10012,OBJECT

1,king,t1,999123,SUBJECT

2,queen,t2,789456,PRODUCT

When I try this:

jdata = json.loads(json_str)
pd.concat([pd.DataFrame(jdata), pd.DataFrame(list(jdata['props']))], axis=1).drop('props', 1)

I get this error:

list indices must be integers or slices, not str

Also tried this:

jdata=json.loads(json_str)
pd.concat([pd.DataFrame(jdata), pd.DataFrame([pd.json_normalize(jdata, "props", errors="ignore", record_prefix="")])], axis=1).drop('props', 1)

throws this error:

Must pass 2-d input. shape={values.shape}

Also tried this:

result = pd.json_normalize(jdata, 'props', errors="ignore", record_prefix="props.")
result2 = pd.json_normalize(jdata, errors="ignore", record_prefix="tmpl.")
df = pd.concat([result, result2], axis=1).drop('props', 1)

No error thrown here, but the concat doesn't line up the two df's. The rows are out of sync.

Thanks for any help.

like image 1000
A.G. Avatar asked Apr 21 '21 00:04

A.G.


1 Answers

You could use json_normalize to simplify the extraction; for each record_path there will be an associated meta:

json_normalize(data = jdata, 
               record_path = 'props', 
               meta = ['name', 'id', 'title']
              )
 
      abc      def name id  title
0   10012   OBJECT   t1  1   king
1  999123  SUBJECT   t1  1   king
2  789456  PRODUCT   t2  2  queen
like image 125
sammywemmy Avatar answered Sep 30 '22 17:09

sammywemmy