Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Starting from the second row. subtract from previous row and use it as value to the next subtraction

Tags:

python

pandas

Context

I need to code an algorithm using Pandas to, starting from the second row, subtract a column value from the previous row, and use the result to keep subtracting the next row, etc.

Example

INPUT:
ID    VALUE
0       1
1       10
2       30
3       45
4       78

OUTPUT (just the result, not the operation itself):
ID    VALUE
0       1
1       9  #(10-1)
2       21 #(30-9)
3       24 #(45-21)
4       54 #(78-24)

What I Tried

df['VALUE'] = df['VALUE'] - df['VALUE]'.shift() # Doesn't starts with the second row, and use the original dataframe to subtract

df['VALUE'] = df['VALUE'].diff() # Doesn't starts with the second row, and use the original dataframe to subtract
like image 234
Lodi Avatar asked Dec 10 '22 05:12

Lodi


1 Answers

Numpy, cumsum with alternating sign

i = np.arange(len(df))
j = np.arange(2)

a = np.where(
    (i[:, None] + j) % 2 == 0, 1, -1
) * df.VALUE.values[:, None]

b = a.cumsum(0)[i, i % 2]

df.assign(VALUE=b)

   ID  VALUE
0   0      1
1   1      9
2   2     21
3   3     24
4   4     54

Explanation

First thing is to notice that

X0 ->                     X0
X1 ->                X1 - X0
X2 ->           X2 - X1 + X0
X3 ->      X3 - X2 + X1 - X0
X4 -> X4 - X3 + X2 - X1 + X0

So I wanted to multiply every other row by negative one... but I needed to do this twice for the other choice of alternating rows.

I needed to generate a mask that swaps between + and - 1 for both options

i = np.arange(len(df))
j = np.arange(2)

m = np.where(
    (i[:, None] + j) % 2 == 0, 1, -1
)

m

array([[ 1, -1],
       [-1,  1],
       [ 1, -1],
       [-1,  1],
       [ 1, -1]])

Now I need to broadcast multiply this across my df.VALUE

a = m * df.VALUE.values[:, None]

a

array([[  1,  -1],
       [-10,  10],
       [ 30, -30],
       [-45,  45],
       [ 78, -78]])

Notice the pattern. Now I cumsum

a.cumsum(0)

array([[  1,  -1],
       [ -9,   9],
       [ 21, -21],
       [-24,  24],
       [ 54, -54]])

But I need the positive ones... more specifically, I need the alternating ones. So I slice with a modded arange

b = a.cumsum(0)[i, i % 2]
b

array([ 1,  9, 21, 24, 54])

This is what I ended up assigning to the existing column

df.assign(VALUE=b)

   ID  VALUE
0   0      1
1   1      9
2   2     21
3   3     24
4   4     54

This produces a copy of df and overwrites the VALUE column with b.
To persist this answer, make sure to reassign to a new name or df if you want.

df_new = df.assign(VALUE=b)
like image 126
piRSquared Avatar answered Dec 12 '22 19:12

piRSquared