I am having trouble reindexing a pandas dataframe after dropping NaN values.
I am trying to extract dicts in a df column to another df, then join those values back to the original df in the corresponding rows.
df = pd.DataFrame({'col1': [1, 2, 3, 4, 5],
'col2': [np.NaN, np.NaN, {'aa': 11, 'bb': 22}, {'aa': 33, 'bb': 44}, {'aa': 55, 'bb': 66}]})
df
col1 col2
0 1 NaN
1 2 NaN
2 3 {'aa': 11, 'bb': 22}
3 4 {'aa': 33, 'bb': 44}
4 5 {'aa': 55, 'bb': 66}
The desired end result is:
col1 aa bb
0 1 NaN NaN
1 2 NaN NaN
2 3 11 22
3 4 33 44
4 5 55 66
If I pass col2 to the pandas .tolist() function, the dict is not unpacked.
pd.DataFrame(df['col2'].tolist())
0 NaN
1 NaN
2 {'aa': 11, 'bb': 22}
3 {'aa': 33, 'bb': 44}
4 {'aa': 55, 'bb': 66}
If I use dropna(), the dict is unpacked but the index is reset
pd.DataFrame(df['col2'].dropna().tolist())
aa bb
0 11 22
1 33 44
2 55 66
If I try to reset the index to that of the original df, the row data appear in different index positions.
pd.DataFrame(df['col2'].dropna().tolist()).reindex(df.index)
aa bb
0 11.0 22.0
1 33.0 44.0
2 55.0 66.0
3 NaN NaN
4 NaN NaN
The data is varied, and there is no way to know how many NaN values will be at any point in the column.
Any help is very much appreciated.
Use Series.to_dict
to take into account the index:
df.join(pd.DataFrame(df['col2'].to_dict()).T).drop(columns='col2')
col1 aa bb
0 1 NaN NaN
1 2 NaN NaN
2 3 11.0 22.0
3 4 33.0 44.0
4 5 55.0 66.0
IIUC fix your code by passing the index
after dropna
s=df.col2.dropna()
df=df.join(pd.DataFrame(s.tolist(), index=s.index))
df
Out[103]:
col1 col2 aa bb
0 1 NaN NaN NaN
1 2 NaN NaN NaN
2 3 {'aa': 11, 'bb': 22} 11.0 22.0
3 4 {'aa': 33, 'bb': 44} 33.0 44.0
4 5 {'aa': 55, 'bb': 66} 55.0 66.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With