Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas explode fails with KeyError: 0

How does a pandas.DataFrame.explode work?

In the documentation:

https://pandas.pydata.org/pandas-docs/version/0.25/reference/api/pandas.DataFrame.explode.html

df = pd.DataFrame({'A': [[1, 2, 3], 'foo', [], [3, 4]], 'B': 1})
display(df)
print(df.columns)
print(df.dtypes)
df.explode('A')

works just fine. But for my data it fails with a key exception. My data originally looks like this: enter image description here

with the following types:

print(foo.columns)
print(foo.dtypes)
Index(['model', 'id_min_days_cutoff'], dtype='object')
model                 object
id_min_days_cutoff     int64
dtype: object

where model is obtained using a statsmodels regression using:

model.summary2().tables[1]

When calling: df.explode('model')

it fails with:

KeyError: 0

Trying to reproduce this:

df_json = df.to_json()

# now load it again for SF purposes
df_json = '{"model":{"0":{"Coef.":{"ALQ_15PLUS_perc":95489.7866599741,"AST_perc":-272.9213162565,"BEV_UNTER15_perc":6781.448845533,"BEV_UEBER65_perc":-46908.2889142205},"Std.Err.":{"ALQ_15PLUS_perc":1399665.9788843254,"AST_perc":1558.1286516172,"BEV_UNTER15_perc":2027111.8764156068,"BEV_UEBER65_perc":1230965.9812726702},"z":{"ALQ_15PLUS_perc":0.0682232676,"AST_perc":-0.1751596802,"BEV_UNTER15_perc":0.0033453747,"BEV_UEBER65_perc":-0.038106893},"P>|z|":{"ALQ_15PLUS_perc":0.9456079052,"AST_perc":0.8609541651,"BEV_UNTER15_perc":0.9973307821,"BEV_UEBER65_perc":0.9696024555},"[0.025":{"ALQ_15PLUS_perc":-2647805.1223393031,"AST_perc":-3326.7973567063,"BEV_UNTER15_perc":-3966284.8215624653,"BEV_UEBER65_perc":-2459557.2784026605},"0.975]":{"ALQ_15PLUS_perc":2838784.6956592514,"AST_perc":2780.9547241933,"BEV_UNTER15_perc":3979847.7192535317,"BEV_UEBER65_perc":2365740.7005742197}},"1":{"Coef.":{"ALQ_15PLUS_perc":-140539.5196612777,"AST_perc":142.579413527,"BEV_UNTER15_perc":-45288.5612893498,"BEV_UEBER65_perc":-152106.9841374909},"Std.Err.":{"ALQ_15PLUS_perc":299852250.9155113101,"AST_perc":24013.7007484301,"BEV_UNTER15_perc":417010365.7919532657,"BEV_UEBER65_perc":171876588.9403209388},"z":{"ALQ_15PLUS_perc":-0.0004686959,"AST_perc":0.0059374194,"BEV_UNTER15_perc":-0.000108603,"BEV_UEBER65_perc":-0.0008849779},"P>|z|":{"ALQ_15PLUS_perc":0.9996260348,"AST_perc":0.9952626525,"BEV_UNTER15_perc":0.9999133474,"BEV_UEBER65_perc":0.9992938899},"[0.025":{"ALQ_15PLUS_perc":-587840151.997330904,"AST_perc":-46923.4091889186,"BEV_UNTER15_perc":-817370586.6933914423,"BEV_UEBER65_perc":-337024031.0927618742},"0.975]":{"ALQ_15PLUS_perc":587559072.9580082893,"AST_perc":47208.5680159725,"BEV_UNTER15_perc":817280009.5708128214,"BEV_UEBER65_perc":336719817.1244869232}}},"id_min_days_cutoff":{"0":2,"1":3}}'
pd.read_json(df_json).explode('model')

fails with:

KeyError: 0

edit

trying to find an alternative using one of: How to unnest (explode) a column in a pandas DataFrame? choosing 2.1

pd.DataFrame({'model':np.concatenate(df_json.model.values)},
               index=df_json.index.repeat(ddf_jsonf.model.str.len()))

but this fails with:

ValueError: zero-dimensional arrays cannot be concatenated

When instead applying it to the original df, not read from JSON:

Exception: Data must be 1-dimensional

How can I get the unnest/explode to work?

like image 902
Georg Heiler Avatar asked Aug 14 '19 07:08

Georg Heiler


People also ask

How do I fix pandas KeyError?

How to Fix the KeyError? We can simply fix the error by correcting the spelling of the key. If we are not sure about the spelling we can simply print the list of all column names and crosscheck.

How does explode work in pandas?

The explode() method converts each element of the specified column(s) into a row.

What is the opposite of explode in pandas?

explode #45459.


1 Answers

If you do have the results of your statsmodels regression in json / dictionary form, you could try to explode the dataframe "by hand". I tried below using list comprehensions. Does the result you are trying to achieve look something like this:

df_json = '{"model":{"0":{"Coef.":{"ALQ_15PLUS_perc":95489.7866599741,"AST_perc":-272.9213162565,"BEV_UNTER15_perc":6781.448845533,"BEV_UEBER65_perc":-46908.2889142205},"Std.Err.":{"ALQ_15PLUS_perc":1399665.9788843254,"AST_perc":1558.1286516172,"BEV_UNTER15_perc":2027111.8764156068,"BEV_UEBER65_perc":1230965.9812726702},"z":{"ALQ_15PLUS_perc":0.0682232676,"AST_perc":-0.1751596802,"BEV_UNTER15_perc":0.0033453747,"BEV_UEBER65_perc":-0.038106893},"P>|z|":{"ALQ_15PLUS_perc":0.9456079052,"AST_perc":0.8609541651,"BEV_UNTER15_perc":0.9973307821,"BEV_UEBER65_perc":0.9696024555},"[0.025":{"ALQ_15PLUS_perc":-2647805.1223393031,"AST_perc":-3326.7973567063,"BEV_UNTER15_perc":-3966284.8215624653,"BEV_UEBER65_perc":-2459557.2784026605},"0.975]":{"ALQ_15PLUS_perc":2838784.6956592514,"AST_perc":2780.9547241933,"BEV_UNTER15_perc":3979847.7192535317,"BEV_UEBER65_perc":2365740.7005742197}},"1":{"Coef.":{"ALQ_15PLUS_perc":-140539.5196612777,"AST_perc":142.579413527,"BEV_UNTER15_perc":-45288.5612893498,"BEV_UEBER65_perc":-152106.9841374909},"Std.Err.":{"ALQ_15PLUS_perc":299852250.9155113101,"AST_perc":24013.7007484301,"BEV_UNTER15_perc":417010365.7919532657,"BEV_UEBER65_perc":171876588.9403209388},"z":{"ALQ_15PLUS_perc":-0.0004686959,"AST_perc":0.0059374194,"BEV_UNTER15_perc":-0.000108603,"BEV_UEBER65_perc":-0.0008849779},"P>|z|":{"ALQ_15PLUS_perc":0.9996260348,"AST_perc":0.9952626525,"BEV_UNTER15_perc":0.9999133474,"BEV_UEBER65_perc":0.9992938899},"[0.025":{"ALQ_15PLUS_perc":-587840151.997330904,"AST_perc":-46923.4091889186,"BEV_UNTER15_perc":-817370586.6933914423,"BEV_UEBER65_perc":-337024031.0927618742},"0.975]":{"ALQ_15PLUS_perc":587559072.9580082893,"AST_perc":47208.5680159725,"BEV_UNTER15_perc":817280009.5708128214,"BEV_UEBER65_perc":336719817.1244869232}}},"id_min_days_cutoff":{"0":2,"1":3}}'

df = pd.read_json(df_json)

# "Explode" the model column (containing a dict of dicts) using list comprehension:
model_col = [k+':'+kk+':'+str(vv) for i in range(0,len(df.model)) for k,v in df.model.iloc[i].items() for kk,vv in v.items()]

# Generate the second column (assuming each row of the original df "explodes" into the same number of rows):
cutoff_col = np.repeat([df['id_min_days_cutoff'].iloc[i] for i in range(0,len(df.model))], len(model_col)/2)

# Get everything into one dataframe
    exploded_df = pd.DataFrame({'model':model_col, 'id_min_days_cutoff': cutoff_col})
    
    exploded_df
                                               model  id_min_days_cutoff
    0         Coef.:ALQ_15PLUS_perc:95489.7866599741                   2
    1                 Coef.:AST_perc:-272.9213162565                   2
    2          Coef.:BEV_UNTER15_perc:6781.448845533                   2
    3       Coef.:BEV_UEBER65_perc:-46908.2889142205                   2
    4    Std.Err.:ALQ_15PLUS_perc:1399665.9788843254                   2
    5              Std.Err.:AST_perc:1558.1286516172                   2
    6   Std.Err.:BEV_UNTER15_perc:2027111.8764156068                   2
    7   Std.Err.:BEV_UEBER65_perc:1230965.9812726702                   2
    8                 z:ALQ_15PLUS_perc:0.0682232676                   2
    9                       z:AST_perc:-0.1751596802                   2
    10               z:BEV_UNTER15_perc:0.0033453747                   2
    11               z:BEV_UEBER65_perc:-0.038106893                   2
    12            P>|z|:ALQ_15PLUS_perc:0.9456079052                   2
    13                   P>|z|:AST_perc:0.8609541651                   2
    14           P>|z|:BEV_UNTER15_perc:0.9973307821                   2
    15           P>|z|:BEV_UEBER65_perc:0.9696024555                   2
    16     [0.025:ALQ_15PLUS_perc:-2647805.122339303                   2
    17              [0.025:AST_perc:-3326.7973567063                   2
    18   [0.025:BEV_UNTER15_perc:-3966284.8215624653                   2
    19   [0.025:BEV_UEBER65_perc:-2459557.2784026605                   2
    20     0.975]:ALQ_15PLUS_perc:2838784.6956592514                   2
    21               0.975]:AST_perc:2780.9547241933                   2
    22    0.975]:BEV_UNTER15_perc:3979847.7192535317                   2
    23    0.975]:BEV_UEBER65_perc:2365740.7005742197                   2
    24      Coef.:ALQ_15PLUS_perc:-140539.5196612777                   3
    25                  Coef.:AST_perc:142.579413527                   3
    26      Coef.:BEV_UNTER15_perc:-45288.5612893498                   3
    27     Coef.:BEV_UEBER65_perc:-152106.9841374909                   3
    28    Std.Err.:ALQ_15PLUS_perc:299852250.9155113                   3
    29            Std.Err.:AST_perc:24013.7007484301                   3
    30  Std.Err.:BEV_UNTER15_perc:417010365.79195327                   3
    31  Std.Err.:BEV_UEBER65_perc:171876588.94032094                   3
    32               z:ALQ_15PLUS_perc:-0.0004686959                   3
    33                       z:AST_perc:0.0059374194                   3
    34               z:BEV_UNTER15_perc:-0.000108603                   3
    35              z:BEV_UEBER65_perc:-0.0008849779                   3
    36            P>|z|:ALQ_15PLUS_perc:0.9996260348                   3
    37                   P>|z|:AST_perc:0.9952626525                   3
    38           P>|z|:BEV_UNTER15_perc:0.9999133474                   3
    39           P>|z|:BEV_UEBER65_perc:0.9992938899                   3
    40     [0.025:ALQ_15PLUS_perc:-587840151.9973309                   3
    41             [0.025:AST_perc:-46923.4091889186                   3
    42    [0.025:BEV_UNTER15_perc:-817370586.6933914                   3
    43    [0.025:BEV_UEBER65_perc:-337024031.0927619                   3
    44      0.975]:ALQ_15PLUS_perc:587559072.9580083                   3
    45              0.975]:AST_perc:47208.5680159725                   3
    46     0.975]:BEV_UNTER15_perc:817280009.5708128                   3
    47     0.975]:BEV_UEBER65_perc:336719817.1244869                   3
like image 88
k07b470n Avatar answered Oct 27 '22 21:10

k07b470n