Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Pandas json_normalize on nested Json with arrays

The problem is normalizing a json with nested array of json objects. I have looked at similar questions and tried to use their solution to no avail.

This is what my json object looks like.

{
  "results": [
    {
      "_id": "25",
      "Product": {
        "Description": "3 YEAR",
        "TypeLevel1": "INTEREST",
        "TypeLevel2": "LONG"
      },
      "Settlement": {},
      "Xref": {
        "SCSP": "96"
      },
      "ProductSMCP": [
        {
          "SMCP": "01"
        }
      ]
    },
    {
      "_id": "26",
      "Product": {
        "Description": "10 YEAR",
        "TypeLevel1": "INTEREST",
        "Currency": "USD",
        "Operational": true,
        "TypeLevel2": "LONG"
      },
      "Settlement": {},
      "Xref": {
        "BBT": "CITITYM9",
        "TCK": "ZN"
      },
      "ProductSMCP": [
        {
          "SMCP": "01"
        },
        {
          "SMCP2": "02"
        }
      ]
    }
  ]
}

Here is my code for normalizing the json object.

data = json.load(j)
data = data['results']
print pd.io.json.json_normalize(data)

The results that I WANT should be like this

id   Description    TypeLevel1   TypeLevel2  Currency  \
25   3 YEAR US      INTEREST     LONG        NAN
26   10 YEAR US     INTEREST     NAN         USD

BBT   TCT  SMCP  SMCP2  SCSP   
NAN   NAN  521   NAN    01
M9    ZN   01    02     NAN

However, the result I get is this:

  Product.Currency Product.Description Product.Operational Product.TypeLevel1  \
0              NaN              3 YEAR                 NaN           INTEREST
1              USD             10 YEAR                True           INTEREST

  Product.TypeLevel2                        ProductSMCP  Xref.BBT Xref.SCSP  \
0               LONG                   [{'SMCP': '01'}]       NaN        96
1               LONG  [{'SMCP': '01'}, {'SMCP2': '02'}]  CITITYM9       NaN

  Xref.TCK _id
0      NaN  25
1       ZN  26

As you can see, the issue is at ProductSCMP, it is not completely flattening the array.

like image 216
Chris Johnson Avatar asked Jul 31 '17 14:07

Chris Johnson


1 Answers

Once we get past first normalization, I'd apply a lambda to finish the job.

from cytoolz.dicttoolz import merge

pd.io.json.json_normalize(data).pipe(
    lambda x: x.drop('ProductSMCP', 1).join(
        x.ProductSMCP.apply(lambda y: pd.Series(merge(y)))
    )
)

  Product.Currency Product.Description Product.Operational Product.TypeLevel1 Product.TypeLevel2  Xref.BBT Xref.SCSP Xref.TCK _id SMCP SMCP2
0              NaN              3 YEAR                 NaN           INTEREST               LONG       NaN        96      NaN  25   01   NaN
1              USD             10 YEAR                True           INTEREST               LONG  CITITYM9       NaN       ZN  26   01    02

Trim Column Names

pd.io.json.json_normalize(data).pipe(
    lambda x: x.drop('ProductSMCP', 1).join(
        x.ProductSMCP.apply(lambda y: pd.Series(merge(y)))
    )
).rename(columns=lambda x: re.sub('(Product|Xref)\.', '', x))

  Currency Description Operational TypeLevel1 TypeLevel2       BBT SCSP  TCK _id SMCP SMCP2
0      NaN      3 YEAR         NaN   INTEREST       LONG       NaN   96  NaN  25   01   NaN
1      USD     10 YEAR        True   INTEREST       LONG  CITITYM9  NaN   ZN  26   01    02
like image 111
piRSquared Avatar answered Nov 01 '22 09:11

piRSquared