This is the current dataframe:
> ID Date current
> 2001980 10/30/2017 1
> 2001980 10/29/2017 0
> 2001980 10/28/2017 0
> 2001980 10/27/2017 40
> 2001980 10/26/2017 39
> 2001980 10/25/2017 0
> 2001980 10/24/2017 0
> 2001980 10/23/2017 60
> 2001980 10/22/2017 0
> 2001980 10/21/2017 0
> 2002222 10/21/2017 0
> 2002222 10/20/2017 0
> 2002222 10/19/2017 16
> 2002222 10/18/2017 0
> 2002222 10/17/2017 0
> 2002222 10/16/2017 20
> 2002222 10/15/2017 19
> 2002222 10/14/2017 18
Below is the final data frame. Column expected
is what I am trying to get.
thank you so much.
> ID Date current expected
> 2001980 10/30/2017 1 1
> 2001980 10/29/2017 0 0
> 2001980 10/28/2017 0 0
> 2001980 10/27/2017 40 40
> 2001980 10/26/2017 39 39
> 2001980 10/25/2017 0 38
> 2001980 10/24/2017 0 37
> 2001980 10/23/2017 60 60
> 2001980 10/22/2017 0 59
> 2001980 10/21/2017 0 58
> 2002222 10/21/2017 0 0
> 2002222 10/20/2017 0 0
> 2002222 10/19/2017 16 16
> 2002222 10/18/2017 0 15
> 2002222 10/17/2017 0 14
> 2002222 10/16/2017 20 20
> 2002222 10/15/2017 19 19
> 2002222 10/14/2017 18 18
I am using Excel with the formula below:
= if(this row's ID = last row's ID, max(last row's expected value - 1, this row's current value), this row's current value)
Revised simpler:
df['expected'] = df.groupby(['ID',df.current.ne(0).cumsum()])['current']\
.transform(lambda x: x.eq(0).cumsum().mul(-1).add(x.iloc[0])).clip(0,np.inf)
Let's have a little fun:
df['expected'] = (df.groupby('ID')['current'].transform(lambda x: x.where(x.ne(0)).ffill()) +
df.groupby(['ID',df.current.ne(0).cumsum()])['current'].transform(lambda x: x.eq(0).cumsum()).mul(-1))\
.clip(0,np.inf).fillna(0).astype(int)
print(df)
Output:
ID Date current expected
0 2001980 10/30/2017 1 1
1 2001980 10/29/2017 0 0
2 2001980 10/28/2017 0 0
3 2001980 10/27/2017 40 40
4 2001980 10/26/2017 39 39
5 2001980 10/25/2017 0 38
6 2001980 10/24/2017 0 37
7 2001980 10/23/2017 60 60
8 2001980 10/22/2017 0 59
9 2001980 10/21/2017 0 58
10 2002222 10/21/2017 0 0
11 2002222 10/20/2017 0 0
12 2002222 10/19/2017 16 16
13 2002222 10/18/2017 0 15
14 2002222 10/17/2017 0 14
15 2002222 10/16/2017 20 20
16 2002222 10/15/2017 19 19
17 2002222 10/14/2017 18 18
#Let's calculate two series first a series to fill the zeros in an 'ID' with the previous non-zero value
s1 = df.groupby('ID')['current'].transform(lambda x: x.where(x.ne(0)).ffill())
s1
Output:
0 1.0
1 1.0
2 1.0
3 40.0
4 39.0
5 39.0
6 39.0
7 60.0
8 60.0
9 60.0
10 NaN
11 NaN
12 16.0
13 16.0
14 16.0
15 20.0
16 19.0
17 18.0
Name: current, dtype: float64
#Second series is a cumulative count of zeroes in a group by 'ID'
s2 = df.groupby(['ID',df.current.ne(0).cumsum()])['current'].transform(lambda x: x.eq(0).cumsum()).mul(-1)
s2
Output:
0 0
1 -1
2 -2
3 0
4 0
5 -1
6 -2
7 0
8 -1
9 -2
10 -1
11 -2
12 0
13 -1
14 -2
15 0
16 0
17 0
Name: current, dtype: int32
(s1 + s2).clip(0, np.inf).fillna(0)
Output:
0 1.0
1 0.0
2 0.0
3 40.0
4 39.0
5 38.0
6 37.0
7 60.0
8 59.0
9 58.0
10 0.0
11 0.0
12 16.0
13 15.0
14 14.0
15 20.0
16 19.0
17 18.0
Name: current, dtype: float64
So you can do this used apply
and nested functions
import pandas as pd
ID = [2001980,2001980,2001980,2001980,2001980,2001980,2001980,2001980,2001980,2001980,2002222,2002222,2002222,2002222,2002222,2002222,2002222,2002222,]
Date = ["10/30/2017","10/29/2017","10/28/2017","10/27/2017","10/26/2017","10/25/2017","10/24/2017","10/23/2017","10/22/2017","10/21/2017","10/21/2017","10/20/2017","10/19/2017","10/18/2017","10/17/2017","10/16/2017","10/15/2017","10/14/2017",]
current = [1 ,0 ,0 ,40,39,0 ,0 ,60,0 ,0 ,0 ,0 ,16,0 ,0 ,20,19,18,]
df = pd.DataFrame({"ID": ID, "Date": Date, "current": current})
Then create the function to update the frame
Python 3.X
def update_frame(df):
last_expected = None
def apply_logic(row):
nonlocal last_expected
last_row_id = row.name - 1
if row.name == 0:
last_expected = row["current"]
return last_expected
last_row = df.iloc[[last_row_id]].iloc[0].to_dict()
last_expected = max(last_expected-1,row['current']) if last_row['ID'] == row['ID'] else row['current']
return last_expected
return apply_logic
Python 2.X
def update_frame(df):
sd = {"last_expected": None}
def apply_logic(row):
last_row_id = row.name - 1
if row.name == 0:
sd['last_expected'] = row["current"]
return sd['last_expected']
last_row = df.iloc[[last_row_id]].iloc[0].to_dict()
sd['last_expected'] = max(sd['last_expected'] - 1,row['current']) if last_row['ID'] == row['ID'] else row['current']
return sd['last_expected']
return apply_logic
And run the function like below
df['expected'] = df.apply(update_frame(df), axis=1)
The output is as expected
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With