Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas parse json in column and expand to new rows in dataframe

Tags:

python

pandas

I have a dataframe containing (record formatted) json strings as follows:

In[9]: pd.DataFrame( {'col1': ['A','B'], 'col2': ['[{"t":"05:15","v":"20.0"}, {"t":"05:20","v":"25.0"}]', 
                                                '[{"t":"05:15","v":"10.0"}, {"t":"05:20","v":"15.0"}]']})

Out[9]: 
  col1                                               col2
0    A  [{"t":"05:15","v":"20.0"}, {"t":"05:20","v":"2...
1    B  [{"t":"05:15","v":"10.0"}, {"t":"05:20","v":"1...

I would like to extract the json and for each record add a new row to the dataframe:

    co1 t           v
0   A   05:15:00    20
1   A   05:20:00    25
2   B   05:15:00    10
3   B   05:20:00    15

I've been experimenting with the following code:

def json_to_df(x):
    df2 = pd.read_json(x.col2)
    return df2

df.apply(json_to_df, axis=1)

but the resulting dataframes are assigned as tuples, rather than creating new rows. Any advice?

like image 972
MarkNS Avatar asked Mar 15 '23 20:03

MarkNS


1 Answers

The problem with apply is that you need to return mulitple rows and it expects only one. A possible solution:

def json_to_df(row):
    _, row = row
    df_json = pd.read_json(row.col2)
    col1 = pd.Series([row.col1]*len(df_json), name='col1')
    return pd.concat([col1,df_json],axis=1)
df = map(json_to_df, df.iterrows())      #returns a list of dataframes
df = reduce(lambda x,y:x.append(y), x)   #glues them together
df

col1    t   v
0   A   05:15   20
1   A   05:20   25
0   B   05:15   10
1   B   05:20   15
like image 139
hellpanderr Avatar answered Apr 09 '23 07:04

hellpanderr