Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: reading multi-index JSON as pandas data frame

I have a JSON file as follows:

{
    "ALPHA": [
        {
            "date": "2021-06-22",
            "constituents": {
                "BBB": 0,
                "EEE": 1,
                "BTB": 1,
                "YUY": 1
            }
        },
        {
            "date": "2021-09-07",
            "constituents": {
                "BBB": 0,
                "EEE": 0,
                "BTB": 0,
                "YUY": 0
            }
        }
    ],
    "BETA": [
        {
            "date": "2021-06-22",
            "constituents": {
                "BBB": 1,
                "EEE": 1,
                "BTB": 1,
                "YUY": 1
            }
        },
        {
            "date": "2021-09-07",
            "constituents": {
                "BBB": 1,
                "EEE": 1,
                "BTB": 1,
                "YUY": 1
            }
        }
    ],

    "THETA": [
        {
            "date": "2021-06-22",
            "constituents": {
                "BBB": 0,
                "EEE": 1,
                "BTB": 1,
                "YUY": 0
            }
        },
        {
            "date": "2021-08-20",
            "constituents": {
                "BBB": 0,
                "EEE": 1,
                "BTB": 1,
                "YUY": 0
            }
        },
        {
            "date": "2021-09-07",
            "constituents": {
                "BBB": 0,
                "EEE": 1,
                "BTB": 1,
                "YUY": 0
            }
        }
    ]
}

I want to read the above into a pandas data frame where the first index is the date, the second index is the first keys (i.e. "ALPHA", "BETA", "THETA"), the columns are the inner keys (i.e. "BBB" ,"EEE", "BTB" ,"YUY"), and the cell values are the values of these inner keys.

How can I read that into pandas from the JSON file?

like image 523
finstats Avatar asked Sep 16 '21 18:09

finstats


People also ask

How do I read a JSON file into a DataFrame in Python?

Reading JSON Files using Pandas To read the files, we use read_json() function and through it, we pass the path to the JSON file we want to read. Once we do that, it returns a “DataFrame”( A table of rows and columns) that stores data.

Can we convert JSON to DataFrame in Python?

You can convert JSON to Pandas DataFrame by simply using read_json() . Just pass JSON string to the function. It takes multiple parameters, for our case I am using orient that specifies the format of JSON string. This function is also used to read JSON files into pandas DataFrame.

How do I load a JSON into a DataFrame?

Convert JSON to Dataframe using read_json() method. In Python, the Pandas module provides a method read_json() to convert JSON to a Dataframe. It can read the JSON contents from a file or use a JSON string directly and transform them into the dataframe.

What is JSON_normalize () in pandas?

Pandas json_normalize () can do most of the work when working with nested data from a JSON file. However, it flattens the entire nested data when your goal might actually be to extract one value. For example, to extract the property math from the following JSON file.

How to read a JSON file via pandas?

To read a JSON file via Pandas, we can use the read_json () method. The result looks great. L et’s take a look at the data types with df.info (). By default, columns that are numerical are cast to numeric types, for example, the math, physics, and chemistry columns have been cast to int64. 2.

How to flatten nested data in pandas JSON_normalize ()?

Pandas json_normalize () can do most of the work when working with nested data from a JSON file. However, it flattens the entire nested data when your goal might actually be to extract one value. For example, to extract the property math from the following JSON file. How can we do that more effectively? The answer is using read_json with glom.

How to extract a single value from deeply nested JSON pandas?

Extracting a single value from deeply nested JSON Pandas json_normalize () can do most of the work when working with nested data from a JSON file. However, it flattens the entire nested data when your goal might actually be to extract one value.


Video Answer


2 Answers

You can use pd.Series to import the JSON to a Pandas series with ALPHA, BETA as index and elements as list. Then expand the list of JSON to individual JSON by .explode(). Expand the inner JSON to dataframe by .apply() + pd.Series.

Append date as index by .set_index() with append=True; swap date from second index to first index by .swaplevel().

Finally, take the column constituents and further expand the inner JSON to dataframe by .apply() + pd.Series, as follows:

(assume you have already loaded the JSON file into j)

df = (pd.Series(j)
        .explode()
        .apply(pd.Series)
        .set_index('date', append=True)
        .swaplevel()['constituents']
        .apply(pd.Series)
     )

Data input:

j = {'ALPHA': [{'date': '2021-06-22',
   'constituents': {'BBB': 0, 'EEE': 1, 'BTB': 1, 'YUY': 1}},
  {'date': '2021-09-07',
   'constituents': {'BBB': 0, 'EEE': 0, 'BTB': 0, 'YUY': 0}}],
 'BETA': [{'date': '2021-06-22',
   'constituents': {'BBB': 1, 'EEE': 1, 'BTB': 1, 'YUY': 1}},
  {'date': '2021-09-07',
   'constituents': {'BBB': 1, 'EEE': 1, 'BTB': 1, 'YUY': 1}}],
 'THETA': [{'date': '2021-06-22',
   'constituents': {'BBB': 0, 'EEE': 1, 'BTB': 1, 'YUY': 0}},
  {'date': '2021-08-20',
   'constituents': {'BBB': 0, 'EEE': 1, 'BTB': 1, 'YUY': 0}},
  {'date': '2021-09-07',
   'constituents': {'BBB': 0, 'EEE': 1, 'BTB': 1, 'YUY': 0}}]}

Output

print(df)


                  BBB  EEE  BTB  YUY
date                                
2021-06-22 ALPHA    0    1    1    1
2021-09-07 ALPHA    0    0    0    0
2021-06-22 BETA     1    1    1    1
2021-09-07 BETA     1    1    1    1
2021-06-22 THETA    0    1    1    0
2021-08-20 THETA    0    1    1    0
2021-09-07 THETA    0    1    1    0
like image 177
SeaBean Avatar answered Nov 03 '22 00:11

SeaBean


I feel you get better performance, and potentially easier manipulation, if you deal with python native data structures outside Pandas, before pulling the final form into Pandas:

Let's flatten the nested dictionary into a single dictionary, using Python's tools:

container = []
for key, value in j.items(): # j is the main dictionary
    for entry in value:
        content = {'date': entry['date'], 
                   'key': key, 
                   # expand the nested constituent data
                   # this gets us a single dictionary 
                   **entry['constituents']}
        container.append(content)

print(container)
[{'date': '2021-06-22',
  'key': 'ALPHA',
  'BBB': 0,
  'EEE': 1,
  'BTB': 1,
  'YUY': 1},
 {'date': '2021-09-07',
  'key': 'ALPHA',
  'BBB': 0,
  'EEE': 0,
  'BTB': 0,
  'YUY': 0},
 {'date': '2021-06-22', 'key': 'BETA', 'BBB': 1, 'EEE': 1, 'BTB': 1, 'YUY': 1},
 {'date': '2021-09-07', 'key': 'BETA', 'BBB': 1, 'EEE': 1, 'BTB': 1, 'YUY': 1},
 {'date': '2021-06-22',
  'key': 'THETA',
  'BBB': 0,
  'EEE': 1,
  'BTB': 1,
  'YUY': 0},
 {'date': '2021-08-20',
  'key': 'THETA',
  'BBB': 0,
  'EEE': 1,
  'BTB': 1,
  'YUY': 0},
 {'date': '2021-09-07',
  'key': 'THETA',
  'BBB': 0,
  'EEE': 1,
  'BTB': 1,
  'YUY': 0}]

Now, build the dataframe, and set the required columns as index:

pd.DataFrame(container).set_index(['date', 'key'])

                  BBB  EEE  BTB  YUY
date       key
2021-06-22 ALPHA    0    1    1    1
2021-09-07 ALPHA    0    0    0    0
2021-06-22 BETA     1    1    1    1
2021-09-07 BETA     1    1    1    1
2021-06-22 THETA    0    1    1    0
2021-08-20 THETA    0    1    1    0
2021-09-07 THETA    0    1    1    0
like image 36
sammywemmy Avatar answered Nov 02 '22 22:11

sammywemmy