The problem is normalizing a json with nested array of json objects. I have looked at similar questions and tried to use their solution to no avail.
This is what my json object looks like.
{
"results": [
{
"_id": "25",
"Product": {
"Description": "3 YEAR",
"TypeLevel1": "INTEREST",
"TypeLevel2": "LONG"
},
"Settlement": {},
"Xref": {
"SCSP": "96"
},
"ProductSMCP": [
{
"SMCP": "01"
}
]
},
{
"_id": "26",
"Product": {
"Description": "10 YEAR",
"TypeLevel1": "INTEREST",
"Currency": "USD",
"Operational": true,
"TypeLevel2": "LONG"
},
"Settlement": {},
"Xref": {
"BBT": "CITITYM9",
"TCK": "ZN"
},
"ProductSMCP": [
{
"SMCP": "01"
},
{
"SMCP2": "02"
}
]
}
]
}
Here is my code for normalizing the json object.
data = json.load(j)
data = data['results']
print pd.io.json.json_normalize(data)
The results that I WANT should be like this
id Description TypeLevel1 TypeLevel2 Currency \
25 3 YEAR US INTEREST LONG NAN
26 10 YEAR US INTEREST NAN USD
BBT TCT SMCP SMCP2 SCSP
NAN NAN 521 NAN 01
M9 ZN 01 02 NAN
However, the result I get is this:
Product.Currency Product.Description Product.Operational Product.TypeLevel1 \
0 NaN 3 YEAR NaN INTEREST
1 USD 10 YEAR True INTEREST
Product.TypeLevel2 ProductSMCP Xref.BBT Xref.SCSP \
0 LONG [{'SMCP': '01'}] NaN 96
1 LONG [{'SMCP': '01'}, {'SMCP2': '02'}] CITITYM9 NaN
Xref.TCK _id
0 NaN 25
1 ZN 26
As you can see, the issue is at ProductSCMP, it is not completely flattening the array.
Once we get past first normalization, I'd apply a lambda
to finish the job.
from cytoolz.dicttoolz import merge
pd.io.json.json_normalize(data).pipe(
lambda x: x.drop('ProductSMCP', 1).join(
x.ProductSMCP.apply(lambda y: pd.Series(merge(y)))
)
)
Product.Currency Product.Description Product.Operational Product.TypeLevel1 Product.TypeLevel2 Xref.BBT Xref.SCSP Xref.TCK _id SMCP SMCP2
0 NaN 3 YEAR NaN INTEREST LONG NaN 96 NaN 25 01 NaN
1 USD 10 YEAR True INTEREST LONG CITITYM9 NaN ZN 26 01 02
Trim Column Names
pd.io.json.json_normalize(data).pipe(
lambda x: x.drop('ProductSMCP', 1).join(
x.ProductSMCP.apply(lambda y: pd.Series(merge(y)))
)
).rename(columns=lambda x: re.sub('(Product|Xref)\.', '', x))
Currency Description Operational TypeLevel1 TypeLevel2 BBT SCSP TCK _id SMCP SMCP2
0 NaN 3 YEAR NaN INTEREST LONG NaN 96 NaN 25 01 NaN
1 USD 10 YEAR True INTEREST LONG CITITYM9 NaN ZN 26 01 02
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With