Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reference a pandas dataframe with an another dataframe in Python?

I have two dataframes in Python - df_item and df_bill shown as below:

df_item:

item_id   item_name
2         Noodles
3         Vegetables
4         Dairy Products
5         Ice Cream

df_bill:

bill_no   item_id
201       3
202       2
203       4
204       3
205       5

The item_id column in df_item sort of acts as a primary key of each row. How do I reference df_item within df_bill in such a way that item_id in df_bill is converted into item_name?

Expected Output:

df_bill:

bill_no  item_name
201      Vegetables
202      Noodles
203      Dairy Products
204      Vegetables
205      Ice Cream
like image 929
K. K. Avatar asked Jan 02 '23 04:01

K. K.


2 Answers

Use map by Series but first remove column item_id by drop or pop:

s = df_bill['item_id'].map(df_item.set_index('item_id')['item_name'])
df = df_bill.drop('item_id', 1).assign(item_name = s)

Or:

s = df_bill.pop('item_id').map(df_item.set_index('item_id')['item_name'])
df = df_bill.assign(item_name = s)

print (df)
   bill_no       item_name
0      201      Vegetables
1      202         Noodles
2      203  Dairy Products
3      204      Vegetables
4      205       Ice Cream
like image 111
jezrael Avatar answered Jan 04 '23 17:01

jezrael


You can join the two dataframes using the item_id column. For this you first need to set the indices correctly, reset it afterwards and remove the superflous column:

df_bill = df_bill.set_index("item_id")
df_item = df_item.set_index("item_id")
df = df_bill.join(df_item).reset_index()
df.drop(columns=["item_id"], inplace=True)

Or, as one chain of actions:

df = (df_bill.set_index("item_id")
             .join(df_item.set_index("item_id"))
             .reset_index()
             .drop(columns=["item_id"]))

Or, probably the easiest, using pandas.DataFrame.merge:

df = df_bill.merge(df_item).drop(columns=["item_id"])

All of them mess up the order of bill_no, though:

   bill_no       item_name
0      201      Vegetables
1      204      Vegetables
2      202         Noodles
3      203  Dairy Products
4      205       Ice Cream

However, you can always call df.sort_values("bill_no") to sort it again:

   bill_no       item_name
1      201      Vegetables
0      202         Noodles
3      203  Dairy Products
2      204      Vegetables
4      205       Ice Crea
like image 28
Graipher Avatar answered Jan 04 '23 17:01

Graipher