Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert non-zero column names to rows in Python

Need to convert sparse dataframe to the shape when for each ID write down non-zero column names as rows.

I've tryed using for loop with iterrows - but it's very slow and I cant use it. Maybe someone have better ideas?

For example, Initial df:

df=pd.DataFrame({'Id':['id1','id2','id3'], 'a':[0,1,1] ,'b':[1,0,1], 'c':[1,1,0]})

Id  a b c
id1 0 1 1 
id2 1 0 1 
id3 1 1 0 

Expected:

Id   columns
id1    b 
id1    c 
id2    a 
id2    c 
id3    a 
id3    b
like image 563
Alex_Y Avatar asked Dec 17 '22 16:12

Alex_Y


2 Answers

using pandas .25.0 , here is a way using .dot and explode:

m=df.set_index('Id')
m.dot(m.columns+',').str[:-1].str.split(',').explode().reset_index(name='Columns')

   Id Columns
0   0       b
1   0       c
2   1       a
3   1       c
4   2       a
5   2       b
like image 64
anky Avatar answered Dec 28 '22 09:12

anky


It appears all you want are the stacked indices, not the values. Might I suggest set_index and stack?

df2 = df.set_index('Id')
(df2[df2.astype(bool)]
     .stack()
     .index
     .to_frame()
     .reset_index(drop=True)
     .set_axis(['Id', 'columns'], axis=1, inplace=False))                                                                               

   Id columns
0   0       b
1   0       c
2   1       a
3   1       c
4   2       a
5   2       b
like image 30
cs95 Avatar answered Dec 28 '22 09:12

cs95