pandas explode fails with KeyError: 0

Tags:

How does a pandas.DataFrame.explode work?

In the documentation:

https://pandas.pydata.org/pandas-docs/version/0.25/reference/api/pandas.DataFrame.explode.html
df = pd.DataFrame({'A': [[1, 2, 3], 'foo', [], [3, 4]], 'B': 1})
display(df)
print(df.columns)
print(df.dtypes)
df.explode('A')

works just fine. But for my data it fails with a key exception. My data originally looks like this: enter image description here

with the following types:

print(foo.columns)
print(foo.dtypes)
Index(['model', 'id_min_days_cutoff'], dtype='object')
model                 object
id_min_days_cutoff     int64
dtype: object

where model is obtained using a statsmodels regression using:

model.summary2().tables[1]

When calling: df.explode('model')

it fails with:

KeyError: 0

Trying to reproduce this:

df_json = df.to_json()

# now load it again for SF purposes
df_json = '{"model":{"0":{"Coef.":{"ALQ_15PLUS_perc":95489.7866599741,"AST_perc":-272.9213162565,"BEV_UNTER15_perc":6781.448845533,"BEV_UEBER65_perc":-46908.2889142205},"Std.Err.":{"ALQ_15PLUS_perc":1399665.9788843254,"AST_perc":1558.1286516172,"BEV_UNTER15_perc":2027111.8764156068,"BEV_UEBER65_perc":1230965.9812726702},"z":{"ALQ_15PLUS_perc":0.0682232676,"AST_perc":-0.1751596802,"BEV_UNTER15_perc":0.0033453747,"BEV_UEBER65_perc":-0.038106893},"P>|z|":{"ALQ_15PLUS_perc":0.9456079052,"AST_perc":0.8609541651,"BEV_UNTER15_perc":0.9973307821,"BEV_UEBER65_perc":0.9696024555},"[0.025":{"ALQ_15PLUS_perc":-2647805.1223393031,"AST_perc":-3326.7973567063,"BEV_UNTER15_perc":-3966284.8215624653,"BEV_UEBER65_perc":-2459557.2784026605},"0.975]":{"ALQ_15PLUS_perc":2838784.6956592514,"AST_perc":2780.9547241933,"BEV_UNTER15_perc":3979847.7192535317,"BEV_UEBER65_perc":2365740.7005742197}},"1":{"Coef.":{"ALQ_15PLUS_perc":-140539.5196612777,"AST_perc":142.579413527,"BEV_UNTER15_perc":-45288.5612893498,"BEV_UEBER65_perc":-152106.9841374909},"Std.Err.":{"ALQ_15PLUS_perc":299852250.9155113101,"AST_perc":24013.7007484301,"BEV_UNTER15_perc":417010365.7919532657,"BEV_UEBER65_perc":171876588.9403209388},"z":{"ALQ_15PLUS_perc":-0.0004686959,"AST_perc":0.0059374194,"BEV_UNTER15_perc":-0.000108603,"BEV_UEBER65_perc":-0.0008849779},"P>|z|":{"ALQ_15PLUS_perc":0.9996260348,"AST_perc":0.9952626525,"BEV_UNTER15_perc":0.9999133474,"BEV_UEBER65_perc":0.9992938899},"[0.025":{"ALQ_15PLUS_perc":-587840151.997330904,"AST_perc":-46923.4091889186,"BEV_UNTER15_perc":-817370586.6933914423,"BEV_UEBER65_perc":-337024031.0927618742},"0.975]":{"ALQ_15PLUS_perc":587559072.9580082893,"AST_perc":47208.5680159725,"BEV_UNTER15_perc":817280009.5708128214,"BEV_UEBER65_perc":336719817.1244869232}}},"id_min_days_cutoff":{"0":2,"1":3}}'
pd.read_json(df_json).explode('model')

fails with:

KeyError: 0

edit

trying to find an alternative using one of: How to unnest (explode) a column in a pandas DataFrame? choosing 2.1

pd.DataFrame({'model':np.concatenate(df_json.model.values)},
               index=df_json.index.repeat(ddf_jsonf.model.str.len()))

but this fails with:

ValueError: zero-dimensional arrays cannot be concatenated

When instead applying it to the original df, not read from JSON:

Exception: Data must be 1-dimensional

How can I get the unnest/explode to work?

902

asked Aug 14 '19 07:08

Georg Heiler

1 Answers

If you do have the results of your statsmodels regression in json / dictionary form, you could try to explode the dataframe "by hand". I tried below using list comprehensions. Does the result you are trying to achieve look something like this:

df_json = '{"model":{"0":{"Coef.":{"ALQ_15PLUS_perc":95489.7866599741,"AST_perc":-272.9213162565,"BEV_UNTER15_perc":6781.448845533,"BEV_UEBER65_perc":-46908.2889142205},"Std.Err.":{"ALQ_15PLUS_perc":1399665.9788843254,"AST_perc":1558.1286516172,"BEV_UNTER15_perc":2027111.8764156068,"BEV_UEBER65_perc":1230965.9812726702},"z":{"ALQ_15PLUS_perc":0.0682232676,"AST_perc":-0.1751596802,"BEV_UNTER15_perc":0.0033453747,"BEV_UEBER65_perc":-0.038106893},"P>|z|":{"ALQ_15PLUS_perc":0.9456079052,"AST_perc":0.8609541651,"BEV_UNTER15_perc":0.9973307821,"BEV_UEBER65_perc":0.9696024555},"[0.025":{"ALQ_15PLUS_perc":-2647805.1223393031,"AST_perc":-3326.7973567063,"BEV_UNTER15_perc":-3966284.8215624653,"BEV_UEBER65_perc":-2459557.2784026605},"0.975]":{"ALQ_15PLUS_perc":2838784.6956592514,"AST_perc":2780.9547241933,"BEV_UNTER15_perc":3979847.7192535317,"BEV_UEBER65_perc":2365740.7005742197}},"1":{"Coef.":{"ALQ_15PLUS_perc":-140539.5196612777,"AST_perc":142.579413527,"BEV_UNTER15_perc":-45288.5612893498,"BEV_UEBER65_perc":-152106.9841374909},"Std.Err.":{"ALQ_15PLUS_perc":299852250.9155113101,"AST_perc":24013.7007484301,"BEV_UNTER15_perc":417010365.7919532657,"BEV_UEBER65_perc":171876588.9403209388},"z":{"ALQ_15PLUS_perc":-0.0004686959,"AST_perc":0.0059374194,"BEV_UNTER15_perc":-0.000108603,"BEV_UEBER65_perc":-0.0008849779},"P>|z|":{"ALQ_15PLUS_perc":0.9996260348,"AST_perc":0.9952626525,"BEV_UNTER15_perc":0.9999133474,"BEV_UEBER65_perc":0.9992938899},"[0.025":{"ALQ_15PLUS_perc":-587840151.997330904,"AST_perc":-46923.4091889186,"BEV_UNTER15_perc":-817370586.6933914423,"BEV_UEBER65_perc":-337024031.0927618742},"0.975]":{"ALQ_15PLUS_perc":587559072.9580082893,"AST_perc":47208.5680159725,"BEV_UNTER15_perc":817280009.5708128214,"BEV_UEBER65_perc":336719817.1244869232}}},"id_min_days_cutoff":{"0":2,"1":3}}'

df = pd.read_json(df_json)

# "Explode" the model column (containing a dict of dicts) using list comprehension:
model_col = [k+':'+kk+':'+str(vv) for i in range(0,len(df.model)) for k,v in df.model.iloc[i].items() for kk,vv in v.items()]

# Generate the second column (assuming each row of the original df "explodes" into the same number of rows):
cutoff_col = np.repeat([df['id_min_days_cutoff'].iloc[i] for i in range(0,len(df.model))], len(model_col)/2)

# Get everything into one dataframe
    exploded_df = pd.DataFrame({'model':model_col, 'id_min_days_cutoff': cutoff_col})
    
    exploded_df
                                               model  id_min_days_cutoff
    0         Coef.:ALQ_15PLUS_perc:95489.7866599741                   2
    1                 Coef.:AST_perc:-272.9213162565                   2
    2          Coef.:BEV_UNTER15_perc:6781.448845533                   2
    3       Coef.:BEV_UEBER65_perc:-46908.2889142205                   2
    4    Std.Err.:ALQ_15PLUS_perc:1399665.9788843254                   2
    5              Std.Err.:AST_perc:1558.1286516172                   2
    6   Std.Err.:BEV_UNTER15_perc:2027111.8764156068                   2
    7   Std.Err.:BEV_UEBER65_perc:1230965.9812726702                   2
    8                 z:ALQ_15PLUS_perc:0.0682232676                   2
    9                       z:AST_perc:-0.1751596802                   2
    10               z:BEV_UNTER15_perc:0.0033453747                   2
    11               z:BEV_UEBER65_perc:-0.038106893                   2
    12            P>|z|:ALQ_15PLUS_perc:0.9456079052                   2
    13                   P>|z|:AST_perc:0.8609541651                   2
    14           P>|z|:BEV_UNTER15_perc:0.9973307821                   2
    15           P>|z|:BEV_UEBER65_perc:0.9696024555                   2
    16     [0.025:ALQ_15PLUS_perc:-2647805.122339303                   2
    17              [0.025:AST_perc:-3326.7973567063                   2
    18   [0.025:BEV_UNTER15_perc:-3966284.8215624653                   2
    19   [0.025:BEV_UEBER65_perc:-2459557.2784026605                   2
    20     0.975]:ALQ_15PLUS_perc:2838784.6956592514                   2
    21               0.975]:AST_perc:2780.9547241933                   2
    22    0.975]:BEV_UNTER15_perc:3979847.7192535317                   2
    23    0.975]:BEV_UEBER65_perc:2365740.7005742197                   2
    24      Coef.:ALQ_15PLUS_perc:-140539.5196612777                   3
    25                  Coef.:AST_perc:142.579413527                   3
    26      Coef.:BEV_UNTER15_perc:-45288.5612893498                   3
    27     Coef.:BEV_UEBER65_perc:-152106.9841374909                   3
    28    Std.Err.:ALQ_15PLUS_perc:299852250.9155113                   3
    29            Std.Err.:AST_perc:24013.7007484301                   3
    30  Std.Err.:BEV_UNTER15_perc:417010365.79195327                   3
    31  Std.Err.:BEV_UEBER65_perc:171876588.94032094                   3
    32               z:ALQ_15PLUS_perc:-0.0004686959                   3
    33                       z:AST_perc:0.0059374194                   3
    34               z:BEV_UNTER15_perc:-0.000108603                   3
    35              z:BEV_UEBER65_perc:-0.0008849779                   3
    36            P>|z|:ALQ_15PLUS_perc:0.9996260348                   3
    37                   P>|z|:AST_perc:0.9952626525                   3
    38           P>|z|:BEV_UNTER15_perc:0.9999133474                   3
    39           P>|z|:BEV_UEBER65_perc:0.9992938899                   3
    40     [0.025:ALQ_15PLUS_perc:-587840151.9973309                   3
    41             [0.025:AST_perc:-46923.4091889186                   3
    42    [0.025:BEV_UNTER15_perc:-817370586.6933914                   3
    43    [0.025:BEV_UEBER65_perc:-337024031.0927619                   3
    44      0.975]:ALQ_15PLUS_perc:587559072.9580083                   3
    45              0.975]:AST_perc:47208.5680159725                   3
    46     0.975]:BEV_UNTER15_perc:817280009.5708128                   3
    47     0.975]:BEV_UEBER65_perc:336719817.1244869                   3

answered Oct 27 '22 21:10

k07b470n

Related questions
                            
                                Mayavi how to show the axes grid
                            
                                Django rest framework best way to validate POST request parameters
                            
                                How to add "greater than 0 and sums to 1" constraint to a regression in Python?
                            
                                How to fix " Timeout when reading response headers from daemon process" error when using WSGI with Django and Apache
                            
                                How to Deallocate memory from an object in Jupyter Notebook
                            
                                In Python PDB, how do I list the source code of a file other than the current file?
                            
                                Lambda calling Lambda - how to access the payload in the second?
                            
                                Matplotlib not working for LInux. Cannot load backend 'TkAgg'
                            
                                How to implement Beholder (Tensorboard plugin) for Keras?
                            
                                How to extract charts/tables/graphs from PDF files using Python?
                            
                                Python - Difference between Sonarcube Vs pylint
                            
                                Matplotlib: How to make scatter plot of arbitrary pyplot figures
                            
                                Selenium Wait for user to click on alert dialog box in python
                            
                                Google BigQuery Python library is 2x as fast as the Node JS library for downloading results
                            
                                HTMLCalendar in Django app: missing arguments
                            
                                Keras predict loop memory leak using tf.data.Dataset but not with a numpy array
                            
                                Disable SameFileError exception in shutil.copy
                            
                                show/hide django admin form field based on selection in another field
                            
                                Which cubic spline method does scipy ndimage use for affine_transform?
                            
                                Meson cannot find pykeepass module, I am certain it is installed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas explode fails with KeyError: 0

Tags:

python

pandas

explode

keyerror

edit

Georg Heiler

People also ask

1 Answers

k07b470n

Recent Activity

Donate For Us