Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas dataframe add "1" in new column if ID exists in other dataframe

I have two dataframes with customer IDs (labeled "C_ID") and with the number of visits for a year.

I want to add a column in my 2010 dataframe, if the customer also shopped in 2009. So I need to create a loop checking if the C_ID from 2010 exist in 2009, add a 1, otherwise a 0.

I used this code and didn't work: (no error message, nothing happens)

for row in df_2010.iterrows():
    #check if C_ID exists in the other dataframe
    check = df_2009[(df_2009['C_ID'] == row['C_ID'])]

    if check.empty:
        #ID not exist in 2009 file, add 0 in new column
        row['shopped2009'] = 0

    else:
        #ID exists in 2009 file, add 1 into same column
        row['shopped2009'] = 1
like image 362
jeangelj Avatar asked Oct 16 '25 16:10

jeangelj


1 Answers

You can use dataframe.isin()

% timeit df_2010['new'] = np.where(df_2010['C_ID'].isin(df_2009['C_ID']), 1, 0)

best of 3: 384 µs per loop

As @Kris suggested

%timeit df_2010['new'] = (df_2010['C_ID'].isin(df_2009['C_ID'])).astype(int)

best of 3: 584 µs per loop

Or

df_2010['new'] = df_2010['C_ID'].isin(df_2009['C_ID'])
like image 170
Vaishali Avatar answered Oct 18 '25 18:10

Vaishali



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!