Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Appending column values into new cell in the same row in Pandas dataframe

I have a csv file that has columns name, sub_a, sub_b, sub_c, sub_d, segment and gender. I would like create a new column classes with all the classes (sub-columns) seperated by comma that each student takes.

What would be the easiest way to accomplish this?

The result dataframe should look like this:

+------+-------+-------+-------+-------+---------+--------+---------------------+
| name | sub_a | sub_b | sub_c | sub_d | segment | gender | classes             |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| john | 1     | 1     | 0     | 1     | 1       | 0      | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mike | 1     | 0     | 1     | 1     | 0       | 0      | sub_a, sub_c, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| mary | 1     | 1     | 0     | 1     | 1       | 1      | sub_a, sub_b, sub_d |
+------+-------+-------+-------+-------+---------+--------+---------------------+
| fred | 1     | 0     | 1     | 0     | 0       | 0      | sub_a, sub_c        |
+------+-------+-------+-------+-------+---------+--------+---------------------+
like image 677
taco is delicious Avatar asked Oct 16 '22 03:10

taco is delicious


2 Answers

Let us try dot

s=df.filter(like='sub')
df['classes']=s.astype(bool).dot(s.columns+',').str[:-1]
like image 56
BENY Avatar answered Oct 20 '22 13:10

BENY


You can use apply with axis=1

For Ex.: if your dataframe like

df
   A_a  A_b  B_b  B_c
0    1    0    0    1
1    0    1    0    1
2    1    0    1    0

you can do

df['classes'] = df.apply(lambda x: ', '.join(df.columns[x==1]), axis = 1)
df
   A_a  A_b  B_b  B_c   classes
0    1    0    0    1  A_a, B_c
1    0    1    0    1  A_b, B_c
2    1    0    1    0  A_a, B_b

To apply on specific columns you can filter first using loc

#for your sample data
df_ = df.loc[:,'sub_a':'sub_d']             #or df.loc[:,'sub_a', 'sub_b', 'sub_c', 'sub_d']
df_.apply(lambda x: ', '.join(df_.columns[x==1]), axis = 1)
like image 26
Dishin H Goyani Avatar answered Oct 20 '22 12:10

Dishin H Goyani