I'm trying to accomplish two things in my Pandas dataframe:
Original Dataset
DateCompleted TranNumber Sales
0 1/1/17 10:15AM 3133 130.31
1 1/1/17 11:21AM 3531 103.12
2 1/1/17 12:31PM 3652 99.23
3 1/2/17 9:31AM 3689 83.22
4 1/2/17 10:31AM 3701 29.93
5 1/3/17 8:30AM 3709 31.31
Desired Output
DateCompleted TranNumber Sales NextTranSales LastRow
0 1/1/17 10:15AM 3133 130.31 103.12 No
1 1/1/17 11:21AM 3531 103.12 99.23 No
2 1/1/17 12:31PM 3652 99.23 NaN Yes
3 1/2/17 9:31AM 3689 83.22 29.93 No
4 1/2/17 10:31AM 3701 29.93 NaN Yes
5 1/3/17 8:30AM 3709 31.31 ... No
I can get the NextTranSales based on:
df['NextTranSales'] = df.Sales.shift(-1)
But I'm having trouble determining the last row in the DateCompleted group and marking NextTranSales as Null if it is the last row.
Thanks for your help!
Method 1: Using tail() method DataFrame. tail(n) to get the last n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the end). By default n = 5, it return the last 5 rows if the value of n is not passed to the method.
To get the most recent date: df["my_date"]. max() Timestamp('2021-12-27 00:00:00')
iloc – Pandas Dataframe. iloc is used to retrieve data by specifying its index. In python negative index starts from the end so we can access the last element of the dataframe by specifying its index to -1.
You can use pandas. Series. between() method to select DataFrame rows between two dates. This method returns a boolean vector representing whether series element lies in the specified range or not.
If your data frame has been sorted by the DateCompleted column, then you might just need groupby.shift
:
date = pd.to_datetime(df.DateCompleted).dt.date
df["NextTranSales"] = df.groupby(date).Sales.shift(-1)
If you need the LastRow
column, you can find out the last row index with groupby
and then assign yes
to the rows:
last_row_index = df.groupby(date, as_index=False).apply(lambda g: g.index[-1])
df["LastRow"] = "No"
df.loc[last_row_index, "LastRow"] = "Yes"
df
NOTE: This depends on Sales
being free of NaN
. If it has any NaN
we will get erroneous determinations of last row. This happens because I'm leveraging the convenience that the shifted column leaves a NaN
in the last position.
d = df.DateCompleted.dt.date
m = {True: 'Yes', False: 'No'}
s = df.groupby(d).Sales.shift(-1)
df = df.assign(NextTranSales=s).assign(LastRow=s.isnull().map(m))
print(df)
DateCompleted TranNumber Sales NextTranSales LastRow
0 2017-01-01 10:15:00 3133 130.31 103.12 No
1 2017-01-01 11:21:00 3531 103.12 99.23 No
2 2017-01-01 12:31:00 3652 99.23 NaN Yes
3 2017-01-02 09:31:00 3689 83.22 29.93 No
4 2017-01-02 10:31:00 3701 29.93 NaN Yes
5 2017-01-03 08:30:00 3709 31.31 NaN Yes
We can be free of the no NaN
restriction with this
d = df.DateCompleted.dt.date
m = {True: 'Yes', False: 'No'}
s = df.groupby(d).Sales.shift(-1)
l = pd.Series(
'Yes', df.groupby(d).tail(1).index
).reindex(df.index, fill_value='No')
df.assign(NextTranSales=s).assign(LastRow=l)
DateCompleted TranNumber Sales NextTranSales LastRow
0 2017-01-01 10:15:00 3133 130.31 103.12 No
1 2017-01-01 11:21:00 3531 103.12 99.23 No
2 2017-01-01 12:31:00 3652 99.23 NaN Yes
3 2017-01-02 09:31:00 3689 83.22 29.93 No
4 2017-01-02 10:31:00 3701 29.93 NaN Yes
5 2017-01-03 08:30:00 3709 31.31 NaN Yes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With