I have a Dataframe with customer info with their purchase details. I am trying to add a new columns that indicates every 3rd purchase done by the same customer.
Given below is the Dataframe
customer_name,bill_no,date
Mark,101,2018-10-01
Scott,102,2018-10-01
Pete,103,2018-10-02
Mark,104,2018-10-02
Mark,105,2018-10-04
Scott,106,2018-10-21
Julie,107,2018-10-03
Kevin,108,2018-10-07
Steve,109,2018-10-02
Mark,110,2018-10-06
Mark,111,2018-10-02
Mark,112,2018-10-05
Mark,113,2018-10-05
I am writing to filter every 3rd purchase done by the same customer. So in this case, I would like to add a flag for the below bill_no
Mark,105,2018-10-04
Mark,112,2018-10-05
Basically every multiple of 3 bill generated for the same customer.
To get the nth row in a Pandas DataFrame, we can use the iloc() method. For example, df. iloc[4] will return the 5th row because row numbers start from 0.
To select every nth row of a DataFrame - we will use the slicing method. Slicing in pandas DataFrame is similar to slicing a list or a string in python. Suppose we want every 2nd row of DataFrame we will use slicing in which we will define 2 after two :: (colons).
Using groupby.cumcount
:
n = 3
df['flag'] = df.groupby('customer_name').cumcount() + 1
df['flag'] = ((df['flag'] % n) == 0).astype(int)
print(df)
customer_name bill_no date flag
0 Mark 101 2018-10-01 0
1 Scott 102 2018-10-01 0
2 Pete 103 2018-10-02 0
3 Mark 104 2018-10-02 0
4 Mark 105 2018-10-04 1
5 Scott 106 2018-10-21 0
6 Julie 107 2018-10-03 0
7 Kevin 108 2018-10-07 0
8 Steve 109 2018-10-02 0
9 Mark 110 2018-10-06 0
10 Mark 111 2018-10-02 0
11 Mark 112 2018-10-05 1
12 Mark 113 2018-10-05 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With