Is it possible to do the followings using Python Pandas?
I have a csv file like the table A.
TABLE A
------------------------------------------------
Name Email
------------------------------------------------
Hinckley Joel [email protected]
Hinckley Joel [email protected]
Hinckley Joel [email protected]
Joel Hinckley [email protected]
Siegel Allison [email protected]
Nielsen Tami [email protected]
Nielsen Tami [email protected]
...
I want to remove the rows with the duplicated name, also I want to add a new column "Secondary Email".
The secondary email will be the first email of the duplicated rows.
The final table I want to make is Table B.
TABLE B
-----------------------------------------------------------
Name Email Secondary Email
-----------------------------------------------------------
Hinckley Joel [email protected] [email protected]
Siegel Allison [email protected]
Nielsen Tami [email protected]
As you can see from Table A and B, I want to consider as a same person even if the first and last name was replaced. (ex : "Hinckley Joel" and "Joel Hinckley")
Also, I want to take the secondary email (ex : [email protected]) and add it to the new column.
Thank you in advance.
This is pivoting with two columns, but you need to remove duplicates:
(df.drop_duplicates()
.assign(col=lambda x: x.groupby("Name").cumcount())
.pivot(index='Name', columns='col', values='Email')
.add_prefix('Email_').reset_index()
)
Output:
col Name Email_0 Email_1
0 Hinckley Joel [email protected] [email protected]
1 Joel Hinckley [email protected] NaN
2 Nielsen Tami [email protected] [email protected]
3 Siegel Allison [email protected] NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With