Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas not reindexing properly with NaN

I am having trouble reindexing a pandas dataframe after dropping NaN values.

I am trying to extract dicts in a df column to another df, then join those values back to the original df in the corresponding rows.

df = pd.DataFrame({'col1': [1, 2, 3, 4, 5], 
                   'col2': [np.NaN, np.NaN, {'aa': 11, 'bb': 22}, {'aa': 33, 'bb': 44}, {'aa': 55, 'bb': 66}]})
df

    col1 col2
0   1    NaN
1   2    NaN
2   3    {'aa': 11, 'bb': 22}
3   4    {'aa': 33, 'bb': 44}
4   5    {'aa': 55, 'bb': 66}

The desired end result is:

    col1    aa      bb
0   1       NaN     NaN
1   2       NaN     NaN
2   3       11      22
3   4       33      44
4   5       55      66

If I pass col2 to the pandas .tolist() function, the dict is not unpacked.

pd.DataFrame(df['col2'].tolist())

0   NaN
1   NaN
2   {'aa': 11, 'bb': 22}
3   {'aa': 33, 'bb': 44}
4   {'aa': 55, 'bb': 66}

If I use dropna(), the dict is unpacked but the index is reset

pd.DataFrame(df['col2'].dropna().tolist())

    aa  bb
0   11  22
1   33  44
2   55  66

If I try to reset the index to that of the original df, the row data appear in different index positions.

pd.DataFrame(df['col2'].dropna().tolist()).reindex(df.index)

    aa  bb
0   11.0    22.0
1   33.0    44.0
2   55.0    66.0
3   NaN     NaN
4   NaN     NaN

The data is varied, and there is no way to know how many NaN values will be at any point in the column.

Any help is very much appreciated.

like image 909
Latecomer Avatar asked Mar 02 '23 18:03

Latecomer


2 Answers

Use Series.to_dict to take into account the index:

df.join(pd.DataFrame(df['col2'].to_dict()).T).drop(columns='col2')
   col1    aa    bb
0     1   NaN   NaN
1     2   NaN   NaN
2     3  11.0  22.0
3     4  33.0  44.0
4     5  55.0  66.0
like image 166
ansev Avatar answered Mar 05 '23 15:03

ansev


IIUC fix your code by passing the index after dropna

s=df.col2.dropna()
df=df.join(pd.DataFrame(s.tolist(), index=s.index))
df
Out[103]: 
   col1                  col2    aa    bb
0     1                   NaN   NaN   NaN
1     2                   NaN   NaN   NaN
2     3  {'aa': 11, 'bb': 22}  11.0  22.0
3     4  {'aa': 33, 'bb': 44}  33.0  44.0
4     5  {'aa': 55, 'bb': 66}  55.0  66.0
like image 29
BENY Avatar answered Mar 05 '23 14:03

BENY