Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python - use previous row's value to update the new rows values

This is the current dataframe:

> ID        Date    current
> 2001980   10/30/2017  1   
> 2001980   10/29/2017  0   
> 2001980   10/28/2017  0   
> 2001980   10/27/2017  40  
> 2001980   10/26/2017  39  
> 2001980   10/25/2017  0   
> 2001980   10/24/2017  0   
> 2001980   10/23/2017  60  
> 2001980   10/22/2017  0   
> 2001980   10/21/2017  0   
> 2002222   10/21/2017  0   
> 2002222   10/20/2017  0   
> 2002222   10/19/2017  16  
> 2002222   10/18/2017  0   
> 2002222   10/17/2017  0   
> 2002222   10/16/2017  20  
> 2002222   10/15/2017  19  
> 2002222   10/14/2017  18  

Below is the final data frame. Column expected is what I am trying to get.

  1. One ID might have multiple date/record/rows. (ID+Date) is unique.
  2. this row's expected value = last row's expected - 1
  3. the minimum value is 0.
  4. Based on the formula in 2, if this row's expected value < this row's current value, then use this row's current value. for example, for ID 2001980 on 10/23/2017. Based on rule 2, the value should be 36, but based on rule 4, 36<60, so we use 60.

thank you so much.

> ID        Date    current expected 
> 2001980   10/30/2017  1   1 
> 2001980   10/29/2017  0   0
> 2001980   10/28/2017  0   0 
> 2001980   10/27/2017  40  40
> 2001980   10/26/2017  39  39 
> 2001980   10/25/2017  0   38
> 2001980   10/24/2017  0   37 
> 2001980   10/23/2017  60  60
> 2001980   10/22/2017  0   59 
> 2001980   10/21/2017  0   58
> 2002222   10/21/2017  0   0
> 2002222   10/20/2017  0   0 
> 2002222   10/19/2017  16  16
> 2002222   10/18/2017  0   15 
> 2002222   10/17/2017  0   14
> 2002222   10/16/2017  20  20
> 2002222   10/15/2017  19  19
> 2002222   10/14/2017  18  18

I am using Excel with the formula below:

= if(this row's ID = last row's ID, max(last row's expected value - 1, this row's current value), this row's current value)

like image 372
qqqwww Avatar asked Mar 02 '18 17:03

qqqwww


2 Answers

Revised simpler:

df['expected'] = df.groupby(['ID',df.current.ne(0).cumsum()])['current']\
  .transform(lambda x: x.eq(0).cumsum().mul(-1).add(x.iloc[0])).clip(0,np.inf)

Let's have a little fun:

df['expected'] = (df.groupby('ID')['current'].transform(lambda x: x.where(x.ne(0)).ffill()) +
df.groupby(['ID',df.current.ne(0).cumsum()])['current'].transform(lambda x: x.eq(0).cumsum()).mul(-1))\
.clip(0,np.inf).fillna(0).astype(int)
print(df)

Output:

         ID        Date  current  expected
0   2001980  10/30/2017        1         1
1   2001980  10/29/2017        0         0
2   2001980  10/28/2017        0         0
3   2001980  10/27/2017       40        40
4   2001980  10/26/2017       39        39
5   2001980  10/25/2017        0        38
6   2001980  10/24/2017        0        37
7   2001980  10/23/2017       60        60
8   2001980  10/22/2017        0        59
9   2001980  10/21/2017        0        58
10  2002222  10/21/2017        0         0
11  2002222  10/20/2017        0         0
12  2002222  10/19/2017       16        16
13  2002222  10/18/2017        0        15
14  2002222  10/17/2017        0        14
15  2002222  10/16/2017       20        20
16  2002222  10/15/2017       19        19
17  2002222  10/14/2017       18        18

Details

Basically, creating series, s1 and subtracting series s2 then clipping negative values and filling nan's with zero.

#Let's calculate two series first a series to fill the zeros in an 'ID' with the previous non-zero value 
s1 = df.groupby('ID')['current'].transform(lambda x: x.where(x.ne(0)).ffill())
s1

Output:

0      1.0
1      1.0
2      1.0
3     40.0
4     39.0
5     39.0
6     39.0
7     60.0
8     60.0
9     60.0
10     NaN
11     NaN
12    16.0
13    16.0
14    16.0
15    20.0
16    19.0
17    18.0
Name: current, dtype: float64

#Second series is a cumulative count of zeroes in a group by 'ID'
s2 = df.groupby(['ID',df.current.ne(0).cumsum()])['current'].transform(lambda x: x.eq(0).cumsum()).mul(-1)
s2

Output:

0     0
1    -1
2    -2
3     0
4     0
5    -1
6    -2
7     0
8    -1
9    -2
10   -1
11   -2
12    0
13   -1
14   -2
15    0
16    0
17    0
Name: current, dtype: int32

Add series together clip and fillna.

(s1 + s2).clip(0, np.inf).fillna(0)

Output:

0      1.0
1      0.0
2      0.0
3     40.0
4     39.0
5     38.0
6     37.0
7     60.0
8     59.0
9     58.0
10     0.0
11     0.0
12    16.0
13    15.0
14    14.0
15    20.0
16    19.0
17    18.0
Name: current, dtype: float64
like image 172
Scott Boston Avatar answered Nov 13 '22 18:11

Scott Boston


So you can do this used apply and nested functions

import pandas as pd
ID = [2001980,2001980,2001980,2001980,2001980,2001980,2001980,2001980,2001980,2001980,2002222,2002222,2002222,2002222,2002222,2002222,2002222,2002222,]
Date = ["10/30/2017","10/29/2017","10/28/2017","10/27/2017","10/26/2017","10/25/2017","10/24/2017","10/23/2017","10/22/2017","10/21/2017","10/21/2017","10/20/2017","10/19/2017","10/18/2017","10/17/2017","10/16/2017","10/15/2017","10/14/2017",]
current = [1 ,0 ,0 ,40,39,0 ,0 ,60,0 ,0 ,0 ,0 ,16,0 ,0 ,20,19,18,]

df = pd.DataFrame({"ID": ID, "Date": Date, "current": current})

Then create the function to update the frame

Python 3.X

def update_frame(df):
    last_expected = None
    def apply_logic(row):
        nonlocal last_expected
        last_row_id = row.name - 1
        if row.name == 0:
            last_expected = row["current"]
            return last_expected
        last_row = df.iloc[[last_row_id]].iloc[0].to_dict()
        last_expected = max(last_expected-1,row['current']) if last_row['ID'] == row['ID'] else row['current']
        return last_expected
    return apply_logic

Python 2.X

def update_frame(df):
    sd = {"last_expected": None}
    def apply_logic(row):
        last_row_id = row.name - 1
        if row.name == 0:
            sd['last_expected'] = row["current"]
            return sd['last_expected']
        last_row = df.iloc[[last_row_id]].iloc[0].to_dict()
        sd['last_expected'] = max(sd['last_expected'] - 1,row['current']) if last_row['ID'] == row['ID'] else row['current']
        return sd['last_expected']
    return apply_logic

And run the function like below

df['expected'] = df.apply(update_frame(df), axis=1)

The output is as expected

Output

like image 20
Tarun Lalwani Avatar answered Nov 13 '22 18:11

Tarun Lalwani