I have a dataframe like this
A B C D E
0 0.0 1.0 0.0 0.0 1.0
1 0.0 0.0 1.0 0.0 0.0
2 0.0 1.0 1.0 1.0 0.0
3 1.0 0.0 0.0 0.0 1.0
4 0.0 0.0 0.0 1.0 0.0
The mission is to get a list like this
0 B,E
1 C
2 B,C,D
3 A,E
4 D
Any ideas, thanks in advance.
Boolean indexing helps us to select the data from the DataFrames using a boolean vector. We need a DataFrame with a boolean index to use the boolean indexing.
tolist() to get a list of a specified column. From the dataframe, we select the column “Name” using a [] operator that returns a Series object. Next, we will use the function Series. to_list() provided by the Series class to convert the series object and return a list.
Convert bool to string: str()
You can use apply
with axis=1
for processing by rows and then compare each row with 1
for index values (because axis=1
each row is converted to Series with index from columns), which are joined by ,
:
s1 = df.apply(lambda x: ','.join(x.index[x == 1]), axis=1)
print (s1)
0 B,E
1 C
2 B,C,D
3 A,E
4 D
dtype: object
Another solution, faster if larger DataFrame
.
First change format of columns to list:
print (['{}, '.format(x) for x in df.columns])
['A, ', 'B, ', 'C, ', 'D, ', 'E, ']
Same like:
s = np.where(df == 1, ['{}, '.format(x) for x in df.columns], '')
because 1
values are casted to True
s. Compare values of DataFrame
and for True
s use custom format of columns names:
s = np.where(df, ['{}, '.format(x) for x in df.columns], '')
print (s)
[['' 'B, ' '' '' 'E, ']
['' '' 'C, ' '' '']
['' 'B, ' 'C, ' 'D, ' '']
['A, ' '' '' '' 'E, ']
['' '' '' 'D, ' '']]
Last join all rows with removing empty values:
s1 = pd.Series([''.join(x).strip(', ') for x in s], index=df.index)
print (s1)
0 B, E
1 C
2 B, C, D
3 A, E
4 D
dtype: object
EDIT: Old answer another better solution:
s1 = df.eq(1).dot(df.columns + ',').str.rstrip(',')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With