Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flatten pandas data frame with a JSON column

Tags:

python

pandas

I have a very large dataset in CSV format in which one column is a JSON string. I want to read this information into a flat Pandas data frame. How can I achieve this efficiently?

Input CSV:

col1,col2,col3,col4
1,Programming,"{""col3_1"":null,""col3_2"":""Java""}",11
2,Sport,"{""col3_1"":null,""col3_2"":""Soccer""}",22
3,Food,"{""col3_1"":null,""col3_2"":""Pizza""}",33 

Expected DataFrame:

+---------------------------------------------------------------+
|   col1    |    col2     |   col3_1    |   col3_2  |   col4    |
+---------------------------------------------------------------+
|    1      | Programming |    None     |    Java   |    11     |
|    2      |    Sport    |    None     |   Soccer  |    22     |
|    3      |    Food     |    None     |   Pizza   |    33     |
+---------------------------------------------------------------+

I can currently get the expected output using the following code. I just want to know if there is a more efficient way to achieve the same.

import json
import pandas
dataset = pandas.read_csv('/dataset.csv')
dataset['col3'] = dataset['col3'].apply(json.loads)
dataset['col3_1'] = dataset['col3'].apply(lambda row: row['col3_1'])
dataset['col3_2'] = dataset['col3'].apply(lambda row: row['col3_2'])
dataset = dataset.drop(columns=['col3'])
like image 573
Mousa Avatar asked Feb 24 '26 02:02

Mousa


1 Answers

you can parse JSON in Pandas column using json.loads() and convert it to Pandas columns using pd.Series():

In [85]: df.join(df.pop('col3').apply(lambda x: pd.Series(json.loads(x))))
Out[85]:
   col1         col2  col4 col3_1  col3_2
0     1  Programming    11   None    Java
1     2        Sport    22   None  Soccer
2     3         Food    33   None   Pizza
like image 192
MaxU - stop WAR against UA Avatar answered Feb 25 '26 15:02

MaxU - stop WAR against UA