I'm trying to accomplish two things in my Pandas dataframe:
Original Dataset
        DateCompleted      TranNumber  Sales
    0   1/1/17 10:15AM     3133         130.31
    1   1/1/17 11:21AM     3531         103.12  
    2   1/1/17 12:31PM     3652         99.23  
    3   1/2/17 9:31AM      3689         83.22
    4   1/2/17 10:31AM     3701         29.93
    5   1/3/17 8:30AM      3709         31.31 
Desired Output
        DateCompleted      TranNumber   Sales    NextTranSales  LastRow
    0   1/1/17 10:15AM     3133         130.31   103.12         No
    1   1/1/17 11:21AM     3531         103.12   99.23          No
    2   1/1/17 12:31PM     3652         99.23    NaN            Yes
    3   1/2/17 9:31AM      3689         83.22    29.93          No 
    4   1/2/17 10:31AM     3701         29.93    NaN            Yes
    5   1/3/17 8:30AM      3709         31.31    ...            No
I can get the NextTranSales based on:
 df['NextTranSales'] = df.Sales.shift(-1)
But I'm having trouble determining the last row in the DateCompleted group and marking NextTranSales as Null if it is the last row.
Thanks for your help!
Method 1: Using tail() method DataFrame. tail(n) to get the last n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the end). By default n = 5, it return the last 5 rows if the value of n is not passed to the method.
To get the most recent date: df["my_date"]. max() Timestamp('2021-12-27 00:00:00')
iloc – Pandas Dataframe. iloc is used to retrieve data by specifying its index. In python negative index starts from the end so we can access the last element of the dataframe by specifying its index to -1.
You can use pandas. Series. between() method to select DataFrame rows between two dates. This method returns a boolean vector representing whether series element lies in the specified range or not.
If your data frame has been sorted by the DateCompleted column, then you might just need groupby.shift:
date = pd.to_datetime(df.DateCompleted).dt.date    
df["NextTranSales"] = df.groupby(date).Sales.shift(-1)

If you need the LastRow column, you can find out the last row index with groupby and then assign yes to the rows:
last_row_index = df.groupby(date, as_index=False).apply(lambda g: g.index[-1])
df["LastRow"] = "No"
df.loc[last_row_index, "LastRow"] = "Yes"
df

NOTE: This depends on Sales being free of NaN.  If it has any NaN we will get erroneous determinations of last row. This happens because I'm leveraging the convenience that the shifted column leaves a NaN in the last position.
d = df.DateCompleted.dt.date
m = {True: 'Yes', False: 'No'}
s = df.groupby(d).Sales.shift(-1)
df = df.assign(NextTranSales=s).assign(LastRow=s.isnull().map(m))
print(df)
        DateCompleted  TranNumber   Sales  NextTranSales LastRow
0 2017-01-01 10:15:00        3133  130.31         103.12      No
1 2017-01-01 11:21:00        3531  103.12          99.23      No
2 2017-01-01 12:31:00        3652   99.23            NaN     Yes
3 2017-01-02 09:31:00        3689   83.22          29.93      No
4 2017-01-02 10:31:00        3701   29.93            NaN     Yes
5 2017-01-03 08:30:00        3709   31.31            NaN     Yes
We can be free of the no NaN restriction with this
d = df.DateCompleted.dt.date
m = {True: 'Yes', False: 'No'}
s = df.groupby(d).Sales.shift(-1)
l = pd.Series(
    'Yes', df.groupby(d).tail(1).index
).reindex(df.index, fill_value='No')
df.assign(NextTranSales=s).assign(LastRow=l)
        DateCompleted  TranNumber   Sales  NextTranSales LastRow
0 2017-01-01 10:15:00        3133  130.31         103.12      No
1 2017-01-01 11:21:00        3531  103.12          99.23      No
2 2017-01-01 12:31:00        3652   99.23            NaN     Yes
3 2017-01-02 09:31:00        3689   83.22          29.93      No
4 2017-01-02 10:31:00        3701   29.93            NaN     Yes
5 2017-01-03 08:30:00        3709   31.31            NaN     Yes
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With