I have a nested JSON structure which I need to flatten. On using JSON normalize it flattens all the keys. But, I want to flatten specific keys while preserving the other keys nested. How to achieve this with JSON normalize. The detail description of what I am trying to do is as follows.
The JSON data that looks something like this
data = {"Attachment":[{"url":"URL001", "type":"pdf"},
{"url":"URL002", "type":"pdf"}],
"Image":{"url":"URL001", "type":"png"},
"Lookup":{"ProductName":"Item001", "ProductId":"001"}}
On running the following snippet it flattens bothImage and Lookup field.
from pandas.io.json import json_normalize
df = json_normalize(data)
df.to_json(orient="records")
The output looks something like,
Attachment Image.URL Image.Type Lookup.ProductName Lookup.ProductId
[{...}, {...}] URL001 png Item001 001
But I don't want to flatten the Image key and preserve it as it is.
The expected Output looks like
Attachment Image Lookup.ProductName Lookup.ProductId
[{...}, {...}] {"url":...,} Item001 001
Is there a way to achieve this using JSON normalize.
How about you just separate data in to two separate dictionaries. Perform 2 different transform operations and then join the respective dataframes:
data1 = {k:v for k,v in data.iteritems() if k!='Image'}
data2 = {k:v for k,v in data.iteritems() if k=='Image'}
df = pd.io.json.json_normalize(data1).join(pd.DataFrame([data2]))
As far as I know, there is no way to flatten one field, but not the others at the same level. Therefore, you can normalize the same json twice, but specifying on which level using max_level in pd.json_normalize function and then joining them together after dropping columns that you don't need.
The code:
import pandas as pd
data = {"Attachment":[{"url":"URL001", "type":"pdf"},
{"url":"URL002", "type":"pdf"}],
"Image":{"url":"URL001", "type":"png"},
"Lookup":{"ProductName":"Item001", "ProductId":"001"}}
df_level0 = pd.json_normalize(data, max_level=0).drop(columns=['Lookup', 'Attachment'])
df_level1 = pd.json_normalize(data, max_level=1)
df_level1 = df_level1.loc[:,~df_level1.columns.str.startswith('Image')]
df = pd.concat([df_level0, df_level1], axis=1)
gives you the expected output
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With