df month order customer
0 Jan yes 020
1 Feb yes 041
2 April no 020
3 May no 020
Is there a way to calculate the last month a customer ordered if order = no? Expected Output
df month order customer last_order
0 Jan yes 020
1 Feb yes 041
2 April no 020 Jan
3 May no 020 Jan
You can df.groupby
, and pd.Series.eq
to check if value is yes
, then use pd.Series.where
and use pd.Series.ffill
, then mask using pd.Series.mask
def func(s):
m = s['order'].eq('yes')
f = s['month'].where(m).ffill()
return f.mask(m)
df['last_order'] = df.groupby('customer', group_keys=False).apply(func)
month order customer last_order
0 Jan yes 020 NaN
1 Feb yes 041 NaN
2 March no 020 Jan
What happens in each of the group after groupby
is the below, for example consider group where customer
is 020
month order
0 jan yes
1 apr no
2 may no
3 jun yes
4 jul no
m = df['order'].eq('yes') # True where `order` is 'yes'
f = df['month'].where(m)#.ffill()
f
0 jan # ---> \
1 NaN \ #`jan` and `jun` are visible as
2 NaN / # they were the months with `order` 'yes'
3 jun # ---> /
4 NaN
Name: month, dtype: object
# If you chain the above with with `ffill` it would fill the NaN values.
f = df['month'].where(m).ffill()
f
0 jan
1 jan # filled with valid above value i.e Jan
2 jan # filled with valid above value i.e Jan
3 jun
4 jun # filled with valid above value i.e Jun
Name: month, dtype: object
f.mask(m) # works opposite of `pd.Series.where`
0 NaN # --->\
1 jan \ # Marked values `NaN` where order was `yes`.
2 jan /
3 NaN # --->/
4 jun
Name: month, dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With