Need to convert sparse dataframe to the shape when for each ID write down non-zero column names as rows.
I've tryed using for loop with iterrows - but it's very slow and I cant use it. Maybe someone have better ideas?
For example, Initial df:
df=pd.DataFrame({'Id':['id1','id2','id3'], 'a':[0,1,1] ,'b':[1,0,1], 'c':[1,1,0]})
Id a b c
id1 0 1 1
id2 1 0 1
id3 1 1 0
Expected:
Id columns
id1 b
id1 c
id2 a
id2 c
id3 a
id3 b
using pandas .25.0
, here is a way using .dot
and explode
:
m=df.set_index('Id')
m.dot(m.columns+',').str[:-1].str.split(',').explode().reset_index(name='Columns')
Id Columns
0 0 b
1 0 c
2 1 a
3 1 c
4 2 a
5 2 b
It appears all you want are the stacked indices, not the values. Might I suggest set_index
and stack
?
df2 = df.set_index('Id')
(df2[df2.astype(bool)]
.stack()
.index
.to_frame()
.reset_index(drop=True)
.set_axis(['Id', 'columns'], axis=1, inplace=False))
Id columns
0 0 b
1 0 c
2 1 a
3 1 c
4 2 a
5 2 b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With