I have a dataframe and I need to filter out who is the owner of which books so we can send them notifications. I am having trouble merging the data in the format I need.
Existing dataframe
| Book | Owner |
|---|---|
| The Alchemist | marry |
| To Kill a Mockingbird | john |
| Lord of the Flies | abel |
| Catcher in the Ry | marry |
| Alabama | julia;marry |
| Invisible Man | john |
I need to create new dataframe that lists the owners in column A and all the books they own in Column B. Desired output
| Owners | Books |
|---|---|
| marry | The Alchemist, Catcher in the Ry, Alabama |
| john | To Kill a Mockingbird, Invisible Man |
| abel | Lord of the Flies |
| julia | Alabama |
I tried creating 2 dfs from and then merging but the results are never accurate. Anyone know a more efficient way to do this?
Current code not working:
from pathlib import Path
import pandas as pd
file1 = Path.cwd() / "./bookgrid.xlsx"
df1 = pd.read_excel(file1)
df2 = pd.read_excel(file1)
##Perfrom the Vlookup Merge
merge = pd.merge(df1, df2, how="left")
merge.to_excel("./results.xlsx")
You need to split, explode, groupby.agg:
(df.assign(Owner=lambda d: d['Owner'].str.split(';'))
.explode('Owner')
.groupby('Owner', as_index=False, sort=False).agg(', '.join)
)
NB. if you need the plural form in the column headers, add .add_suffix('s') or .rename(columns={'Book': 'Books', 'Owner': 'Owners'}).
Output:
Owner Book
0 marry The Alchemist, Catcher in the Ry, Alabama
1 john To Kill a Mockingbird, Invisible Man
2 abel Lord of the Flies
3 julia Alabama
Lets try something new
s = df['Owner'].str.get_dummies(';')
(s.T @ df['Book'].add(', ')).str.rstrip(', ')
Result
abel Lord of the Flies
john To Kill a Mockingbird, Invisible Man
julia Alabama
marry The Alchemist, Catcher in the Ry, Alabama
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With