I have a dataframe with similar rows with a unique column value. If any of the rows have a duplicate combination of values, I need to concatenate the unique values into a column for each row.
Sample Data
| program | subject | course | title |
|:------- |:------- |:------ |:----- |
|music | eng | 101 | 000 |
|music | math | 101 | 123 |
|music | eng | 102 | 000 |
|music | math | 101 | 456 |
|art | span | 201 | 123 |
|art | hst | 101 | 000 |
|art | span | 201 | 456 |
|art | span | 202 | 000 |
Desired Data
| program | subject | course | title. |
|:------- |:------- |:------ |:----- |
|music | eng | 101 | 000 |
|music | math | 101 | 123-456 |
|music | eng | 102 | 000 |
|music | math | 101 | 456-123 |
|art | span | 201 | 123-456 |
|art | hst | 101 | 000 |
|art | span | 201 | 456-123 |
|art | span | 202 | 000 |
The first three columns in the 2nd and 4th as well as the 5th and 7th rows match. I want to concatenate the titles so each row contains a combination of titles for matching rows.
Let's try groupby transform:
df['title'] = df.groupby(
['program', 'subject', 'course'], as_index=False, sort=False
)['title'].transform('-'.join)
print(df)
Output:
program subject course title
0 music eng 101 000
1 music math 101 123-456
2 music eng 102 000
3 music math 101 123-456
4 art span 201 123-456
5 art hst 101 000
6 art span 201 123-456
7 art span 202 000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With