Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert python JSON list to dataframe columns without looping

I'm using python and trying to figure out how to do the following without using a loop.

I have a dataframe that has several columns including one that has a JSON objects list. What I'm trying to do is convert the JSON string column into their own columns within the dataframe. For example I have the following dataframe:

name age group
John 35 [{"testid": "001", "marks": 67}, {"testid": "002", "marks": 70}]
Ann 20 [{"testid": "001", "marks": 75}, {"testid": "002", "marks": 80}, {"testid": "003", "marks": 87}]
Emma 25 [{"testid": "001", "marks": 90}, {"testid": "002", "marks": 99}]

I want to get marks for testid = 001 and testid = 002 as follows.

name age test_id1 test_id2
John 35 67 70
Ann 20 75 80
Emma 25 90 99

Here is my dataset

[
   {
      "name":"John",
      "age":35,
      "group":[
         {
            "testid":"001",
            "marks":67
         },
         {
            "testid":"002",
            "marks":70
         }
      ]
   },
   {
      "name":"Ann",
      "age":20,
      "group":[
         {
            "testid":"001",
            "marks":75
         },
         {
            "testid":"002",
            "marks":80
         },
         {
            "testid":"003",
            "marks":87
         }
      ]
   },
   {
      "name":"Emma",
      "age":25,
      "group":[
         {
            "testid":"001",
            "marks":90
         },
         {
            "testid":"002",
            "marks":99
         }
      ]
   }
]

Any idea is highly appreciated. Thank you.

like image 887
Natasha Perera Avatar asked May 20 '26 14:05

Natasha Perera


1 Answers

A list compreshension is handy here in pulling the data out; as a side note, if you can, possibly do the extraction, before getting the dict like data into a dataframe (more efficient to do so):

outcome = [[entry[num]['marks']
           for num in range(len(entry)) 
           if entry[num]['testid'] in ('001', '002')] 
           for entry in df.group]

print(outcome)
[[67, 70], [75, 80], [90, 99]]

Zip the data, and assign to new column names in the dataframe:

test_id1, test_id2 = zip(*outcome)

df.filter(['name', 'age']).assign(test_id1 = test_id1, test_id2 = test_id2)

   name  age  test_id1  test_id2
0  John   35        67        70
1   Ann   20        75        80
2  Emma   25        90        99
like image 124
sammywemmy Avatar answered May 23 '26 11:05

sammywemmy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!