I have two dataframes in Python - df_item and df_bill shown as below:
df_item:
item_id item_name
2 Noodles
3 Vegetables
4 Dairy Products
5 Ice Cream
df_bill:
bill_no item_id
201 3
202 2
203 4
204 3
205 5
The item_id column in df_item sort of acts as a primary key of each row. How do I reference df_item within df_bill in such a way that item_id in df_bill is converted into item_name?
Expected Output:
df_bill:
bill_no item_name
201 Vegetables
202 Noodles
203 Dairy Products
204 Vegetables
205 Ice Cream
Use map by Series but first remove column item_id by drop or pop:
s = df_bill['item_id'].map(df_item.set_index('item_id')['item_name'])
df = df_bill.drop('item_id', 1).assign(item_name = s)
Or:
s = df_bill.pop('item_id').map(df_item.set_index('item_id')['item_name'])
df = df_bill.assign(item_name = s)
print (df)
bill_no item_name
0 201 Vegetables
1 202 Noodles
2 203 Dairy Products
3 204 Vegetables
4 205 Ice Cream
You can join the two dataframes using the item_id column. For this you first need to set the indices correctly, reset it afterwards and remove the superflous column:
df_bill = df_bill.set_index("item_id")
df_item = df_item.set_index("item_id")
df = df_bill.join(df_item).reset_index()
df.drop(columns=["item_id"], inplace=True)
Or, as one chain of actions:
df = (df_bill.set_index("item_id")
.join(df_item.set_index("item_id"))
.reset_index()
.drop(columns=["item_id"]))
Or, probably the easiest, using pandas.DataFrame.merge:
df = df_bill.merge(df_item).drop(columns=["item_id"])
All of them mess up the order of bill_no, though:
bill_no item_name
0 201 Vegetables
1 204 Vegetables
2 202 Noodles
3 203 Dairy Products
4 205 Ice Cream
However, you can always call df.sort_values("bill_no") to sort it again:
bill_no item_name
1 201 Vegetables
0 202 Noodles
3 203 Dairy Products
2 204 Vegetables
4 205 Ice Crea
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With