I have two dataframes in Python - df_item
and df_bill
shown as below:
df_item:
item_id item_name
2 Noodles
3 Vegetables
4 Dairy Products
5 Ice Cream
df_bill:
bill_no item_id
201 3
202 2
203 4
204 3
205 5
The item_id
column in df_item
sort of acts as a primary key of each row. How do I reference df_item
within df_bill
in such a way that item_id
in df_bill
is converted into item_name
?
Expected Output:
df_bill:
bill_no item_name
201 Vegetables
202 Noodles
203 Dairy Products
204 Vegetables
205 Ice Cream
Use map
by Series
but first remove column item_id
by drop
or pop
:
s = df_bill['item_id'].map(df_item.set_index('item_id')['item_name'])
df = df_bill.drop('item_id', 1).assign(item_name = s)
Or:
s = df_bill.pop('item_id').map(df_item.set_index('item_id')['item_name'])
df = df_bill.assign(item_name = s)
print (df)
bill_no item_name
0 201 Vegetables
1 202 Noodles
2 203 Dairy Products
3 204 Vegetables
4 205 Ice Cream
You can join the two dataframes using the item_id
column. For this you first need to set the indices correctly, reset it afterwards and remove the superflous column:
df_bill = df_bill.set_index("item_id")
df_item = df_item.set_index("item_id")
df = df_bill.join(df_item).reset_index()
df.drop(columns=["item_id"], inplace=True)
Or, as one chain of actions:
df = (df_bill.set_index("item_id")
.join(df_item.set_index("item_id"))
.reset_index()
.drop(columns=["item_id"]))
Or, probably the easiest, using pandas.DataFrame.merge
:
df = df_bill.merge(df_item).drop(columns=["item_id"])
All of them mess up the order of bill_no
, though:
bill_no item_name
0 201 Vegetables
1 204 Vegetables
2 202 Noodles
3 203 Dairy Products
4 205 Ice Cream
However, you can always call df.sort_values("bill_no")
to sort it again:
bill_no item_name
1 201 Vegetables
0 202 Noodles
3 203 Dairy Products
2 204 Vegetables
4 205 Ice Crea
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With