How does a pandas.DataFrame.explode
work?
In the documentation:
https://pandas.pydata.org/pandas-docs/version/0.25/reference/api/pandas.DataFrame.explode.html
df = pd.DataFrame({'A': [[1, 2, 3], 'foo', [], [3, 4]], 'B': 1}) display(df) print(df.columns) print(df.dtypes) df.explode('A')
works just fine. But for my data it fails with a key exception. My data originally looks like this:
with the following types:
print(foo.columns)
print(foo.dtypes)
Index(['model', 'id_min_days_cutoff'], dtype='object')
model object
id_min_days_cutoff int64
dtype: object
where model
is obtained using a statsmodels regression using:
model.summary2().tables[1]
When calling: df.explode('model')
it fails with:
KeyError: 0
Trying to reproduce this:
df_json = df.to_json()
# now load it again for SF purposes
df_json = '{"model":{"0":{"Coef.":{"ALQ_15PLUS_perc":95489.7866599741,"AST_perc":-272.9213162565,"BEV_UNTER15_perc":6781.448845533,"BEV_UEBER65_perc":-46908.2889142205},"Std.Err.":{"ALQ_15PLUS_perc":1399665.9788843254,"AST_perc":1558.1286516172,"BEV_UNTER15_perc":2027111.8764156068,"BEV_UEBER65_perc":1230965.9812726702},"z":{"ALQ_15PLUS_perc":0.0682232676,"AST_perc":-0.1751596802,"BEV_UNTER15_perc":0.0033453747,"BEV_UEBER65_perc":-0.038106893},"P>|z|":{"ALQ_15PLUS_perc":0.9456079052,"AST_perc":0.8609541651,"BEV_UNTER15_perc":0.9973307821,"BEV_UEBER65_perc":0.9696024555},"[0.025":{"ALQ_15PLUS_perc":-2647805.1223393031,"AST_perc":-3326.7973567063,"BEV_UNTER15_perc":-3966284.8215624653,"BEV_UEBER65_perc":-2459557.2784026605},"0.975]":{"ALQ_15PLUS_perc":2838784.6956592514,"AST_perc":2780.9547241933,"BEV_UNTER15_perc":3979847.7192535317,"BEV_UEBER65_perc":2365740.7005742197}},"1":{"Coef.":{"ALQ_15PLUS_perc":-140539.5196612777,"AST_perc":142.579413527,"BEV_UNTER15_perc":-45288.5612893498,"BEV_UEBER65_perc":-152106.9841374909},"Std.Err.":{"ALQ_15PLUS_perc":299852250.9155113101,"AST_perc":24013.7007484301,"BEV_UNTER15_perc":417010365.7919532657,"BEV_UEBER65_perc":171876588.9403209388},"z":{"ALQ_15PLUS_perc":-0.0004686959,"AST_perc":0.0059374194,"BEV_UNTER15_perc":-0.000108603,"BEV_UEBER65_perc":-0.0008849779},"P>|z|":{"ALQ_15PLUS_perc":0.9996260348,"AST_perc":0.9952626525,"BEV_UNTER15_perc":0.9999133474,"BEV_UEBER65_perc":0.9992938899},"[0.025":{"ALQ_15PLUS_perc":-587840151.997330904,"AST_perc":-46923.4091889186,"BEV_UNTER15_perc":-817370586.6933914423,"BEV_UEBER65_perc":-337024031.0927618742},"0.975]":{"ALQ_15PLUS_perc":587559072.9580082893,"AST_perc":47208.5680159725,"BEV_UNTER15_perc":817280009.5708128214,"BEV_UEBER65_perc":336719817.1244869232}}},"id_min_days_cutoff":{"0":2,"1":3}}'
pd.read_json(df_json).explode('model')
fails with:
KeyError: 0
trying to find an alternative using one of: How to unnest (explode) a column in a pandas DataFrame? choosing 2.1
pd.DataFrame({'model':np.concatenate(df_json.model.values)},
index=df_json.index.repeat(ddf_jsonf.model.str.len()))
but this fails with:
ValueError: zero-dimensional arrays cannot be concatenated
When instead applying it to the original df, not read from JSON:
Exception: Data must be 1-dimensional
How can I get the unnest/explode to work?
How to Fix the KeyError? We can simply fix the error by correcting the spelling of the key. If we are not sure about the spelling we can simply print the list of all column names and crosscheck.
The explode() method converts each element of the specified column(s) into a row.
explode #45459.
If you do have the results of your statsmodels regression in json / dictionary form, you could try to explode the dataframe "by hand". I tried below using list comprehensions. Does the result you are trying to achieve look something like this:
df_json = '{"model":{"0":{"Coef.":{"ALQ_15PLUS_perc":95489.7866599741,"AST_perc":-272.9213162565,"BEV_UNTER15_perc":6781.448845533,"BEV_UEBER65_perc":-46908.2889142205},"Std.Err.":{"ALQ_15PLUS_perc":1399665.9788843254,"AST_perc":1558.1286516172,"BEV_UNTER15_perc":2027111.8764156068,"BEV_UEBER65_perc":1230965.9812726702},"z":{"ALQ_15PLUS_perc":0.0682232676,"AST_perc":-0.1751596802,"BEV_UNTER15_perc":0.0033453747,"BEV_UEBER65_perc":-0.038106893},"P>|z|":{"ALQ_15PLUS_perc":0.9456079052,"AST_perc":0.8609541651,"BEV_UNTER15_perc":0.9973307821,"BEV_UEBER65_perc":0.9696024555},"[0.025":{"ALQ_15PLUS_perc":-2647805.1223393031,"AST_perc":-3326.7973567063,"BEV_UNTER15_perc":-3966284.8215624653,"BEV_UEBER65_perc":-2459557.2784026605},"0.975]":{"ALQ_15PLUS_perc":2838784.6956592514,"AST_perc":2780.9547241933,"BEV_UNTER15_perc":3979847.7192535317,"BEV_UEBER65_perc":2365740.7005742197}},"1":{"Coef.":{"ALQ_15PLUS_perc":-140539.5196612777,"AST_perc":142.579413527,"BEV_UNTER15_perc":-45288.5612893498,"BEV_UEBER65_perc":-152106.9841374909},"Std.Err.":{"ALQ_15PLUS_perc":299852250.9155113101,"AST_perc":24013.7007484301,"BEV_UNTER15_perc":417010365.7919532657,"BEV_UEBER65_perc":171876588.9403209388},"z":{"ALQ_15PLUS_perc":-0.0004686959,"AST_perc":0.0059374194,"BEV_UNTER15_perc":-0.000108603,"BEV_UEBER65_perc":-0.0008849779},"P>|z|":{"ALQ_15PLUS_perc":0.9996260348,"AST_perc":0.9952626525,"BEV_UNTER15_perc":0.9999133474,"BEV_UEBER65_perc":0.9992938899},"[0.025":{"ALQ_15PLUS_perc":-587840151.997330904,"AST_perc":-46923.4091889186,"BEV_UNTER15_perc":-817370586.6933914423,"BEV_UEBER65_perc":-337024031.0927618742},"0.975]":{"ALQ_15PLUS_perc":587559072.9580082893,"AST_perc":47208.5680159725,"BEV_UNTER15_perc":817280009.5708128214,"BEV_UEBER65_perc":336719817.1244869232}}},"id_min_days_cutoff":{"0":2,"1":3}}'
df = pd.read_json(df_json)
# "Explode" the model column (containing a dict of dicts) using list comprehension:
model_col = [k+':'+kk+':'+str(vv) for i in range(0,len(df.model)) for k,v in df.model.iloc[i].items() for kk,vv in v.items()]
# Generate the second column (assuming each row of the original df "explodes" into the same number of rows):
cutoff_col = np.repeat([df['id_min_days_cutoff'].iloc[i] for i in range(0,len(df.model))], len(model_col)/2)
# Get everything into one dataframe
exploded_df = pd.DataFrame({'model':model_col, 'id_min_days_cutoff': cutoff_col})
exploded_df
model id_min_days_cutoff
0 Coef.:ALQ_15PLUS_perc:95489.7866599741 2
1 Coef.:AST_perc:-272.9213162565 2
2 Coef.:BEV_UNTER15_perc:6781.448845533 2
3 Coef.:BEV_UEBER65_perc:-46908.2889142205 2
4 Std.Err.:ALQ_15PLUS_perc:1399665.9788843254 2
5 Std.Err.:AST_perc:1558.1286516172 2
6 Std.Err.:BEV_UNTER15_perc:2027111.8764156068 2
7 Std.Err.:BEV_UEBER65_perc:1230965.9812726702 2
8 z:ALQ_15PLUS_perc:0.0682232676 2
9 z:AST_perc:-0.1751596802 2
10 z:BEV_UNTER15_perc:0.0033453747 2
11 z:BEV_UEBER65_perc:-0.038106893 2
12 P>|z|:ALQ_15PLUS_perc:0.9456079052 2
13 P>|z|:AST_perc:0.8609541651 2
14 P>|z|:BEV_UNTER15_perc:0.9973307821 2
15 P>|z|:BEV_UEBER65_perc:0.9696024555 2
16 [0.025:ALQ_15PLUS_perc:-2647805.122339303 2
17 [0.025:AST_perc:-3326.7973567063 2
18 [0.025:BEV_UNTER15_perc:-3966284.8215624653 2
19 [0.025:BEV_UEBER65_perc:-2459557.2784026605 2
20 0.975]:ALQ_15PLUS_perc:2838784.6956592514 2
21 0.975]:AST_perc:2780.9547241933 2
22 0.975]:BEV_UNTER15_perc:3979847.7192535317 2
23 0.975]:BEV_UEBER65_perc:2365740.7005742197 2
24 Coef.:ALQ_15PLUS_perc:-140539.5196612777 3
25 Coef.:AST_perc:142.579413527 3
26 Coef.:BEV_UNTER15_perc:-45288.5612893498 3
27 Coef.:BEV_UEBER65_perc:-152106.9841374909 3
28 Std.Err.:ALQ_15PLUS_perc:299852250.9155113 3
29 Std.Err.:AST_perc:24013.7007484301 3
30 Std.Err.:BEV_UNTER15_perc:417010365.79195327 3
31 Std.Err.:BEV_UEBER65_perc:171876588.94032094 3
32 z:ALQ_15PLUS_perc:-0.0004686959 3
33 z:AST_perc:0.0059374194 3
34 z:BEV_UNTER15_perc:-0.000108603 3
35 z:BEV_UEBER65_perc:-0.0008849779 3
36 P>|z|:ALQ_15PLUS_perc:0.9996260348 3
37 P>|z|:AST_perc:0.9952626525 3
38 P>|z|:BEV_UNTER15_perc:0.9999133474 3
39 P>|z|:BEV_UEBER65_perc:0.9992938899 3
40 [0.025:ALQ_15PLUS_perc:-587840151.9973309 3
41 [0.025:AST_perc:-46923.4091889186 3
42 [0.025:BEV_UNTER15_perc:-817370586.6933914 3
43 [0.025:BEV_UEBER65_perc:-337024031.0927619 3
44 0.975]:ALQ_15PLUS_perc:587559072.9580083 3
45 0.975]:AST_perc:47208.5680159725 3
46 0.975]:BEV_UNTER15_perc:817280009.5708128 3
47 0.975]:BEV_UEBER65_perc:336719817.1244869 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With