I have a csv file that has columns name
, sub_a
, sub_b
, sub_c
, sub_d
, segment
and gender
. I would like create a new column classes
with all the classes (sub
-columns) seperated by comma that each student takes.
What would be the easiest way to accomplish this?
The result dataframe should look like this:
+------+-------+-------+-------+-------+---------+--------+---------------------+
| name | sub_a | sub_b | sub_c | sub_d | segment | gender | classes |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| john | 1 | 1 | 0 | 1 | 1 | 0 | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mike | 1 | 0 | 1 | 1 | 0 | 0 | sub_a, sub_c, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mary | 1 | 1 | 0 | 1 | 1 | 1 | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| fred | 1 | 0 | 1 | 0 | 0 | 0 | sub_a, sub_c |
+------+-------+-------+-------+-------+---------+--------+---------------------+
Let us try dot
s=df.filter(like='sub')
df['classes']=s.astype(bool).dot(s.columns+',').str[:-1]
You can use apply
with axis=1
For Ex.: if your dataframe like
df
A_a A_b B_b B_c
0 1 0 0 1
1 0 1 0 1
2 1 0 1 0
you can do
df['classes'] = df.apply(lambda x: ', '.join(df.columns[x==1]), axis = 1)
df
A_a A_b B_b B_c classes
0 1 0 0 1 A_a, B_c
1 0 1 0 1 A_b, B_c
2 1 0 1 0 A_a, B_b
To apply
on specific columns you can filter first using loc
#for your sample data
df_ = df.loc[:,'sub_a':'sub_d'] #or df.loc[:,'sub_a', 'sub_b', 'sub_c', 'sub_d']
df_.apply(lambda x: ', '.join(df_.columns[x==1]), axis = 1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With