Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to transfer negative value at current row to previous row in a data frame?

I want to transfer the negative values at the current row to the previous row by adding them to the previous row within each group. Following is the sample raw data I have:

raw_data <- data.frame(GROUP = rep(c('A','B','C'),each = 6),
                   YEARMO = rep(c(201801:201806),3),
                   VALUE = c(100,-10,20,70,-50,30,20,60,40,-20,-10,50,0,10,-30,50,100,-100))
> raw_data
  GROUP YEARMO VALUE
1      A 201801   100  
2      A 201802   -10
3      A 201803    20
4      A 201804    70
5      A 201805   -50
6      A 201806    30
7      B 201801    20
8      B 201802    60
9      B 201803    40
10     B 201804   -20
11     B 201805   -10
12     B 201806    50
13     C 201801     0
14     C 201802    10
15     C 201803   -30
16     C 201804    50
17     C 201805   100
18     C 201806  -100

Following is the output that I want:

final_data <- data.frame(GROUP = rep(c('A','B','C'),each = 6),
                   YEARMO = rep(c(201801:201806),3),
                   VALUE = c(90,0,20,20,0,30,20,60,10,0,0,50,-20,0,0,50,0,0))
> final_data
   GROUP YEARMO VALUE
1      A 201801    90
2      A 201802     0
3      A 201803    20
4      A 201804    20
5      A 201805     0
6      A 201806    30
7      B 201801    20
8      B 201802    60
9      B 201803    10
10     B 201804     0
11     B 201805     0
12     B 201806    50
13     C 201801   -20
14     C 201802     0
15     C 201803     0
16     C 201804    50
17     C 201805     0
18     C 201806     0

Following data frames will show how the transformation can be made in each group:

Trans_GRP_A <- data.frame(GROUP = rep('A',each = 6),
                   YEARMO = c(201801:201806),
                   VALUE = c(100,-10,20,70,-50,30),
                   ITER_1 = c(100,-10,20,20,0,30),
                   ITER_2 = c(90,0,20,20,0,30))
> Trans_GRP_A
  GROUP YEARMO VALUE ITER_1 ITER_2
1     A 201801   100    100     90
2     A 201802   -10    -10      0
3     A 201803    20     20     20
4     A 201804    70     20     20
5     A 201805   -50      0      0
6     A 201806    30     30     30

> Trans_GRP_B <- data.frame(GROUP = rep('B',each = 6),
+                           YEARMO = c(201801:201806),
+                           VALUE = c(20,60,40,-20,-10,50),
+                           ITER_1 = c(20,60,40,-30,0,50),
+                           ITER_2 = c(20,60,10,0,0,50))
> Trans_GRP_B
  GROUP YEARMO VALUE ITER_1 ITER_2
1     B 201801    20     20     20
2     B 201802    60     60     60
3     B 201803    40     40     10
4     B 201804   -20    -30      0
5     B 201805   -10      0      0
6     B 201806    50     50     50

> Trans_GRP_C <- data.frame(GROUP = rep('C',each = 6),
+                           YEARMO = c(201801:201806),
+                           VALUE = c(0,10,-30,50,100,-100),
+                           ITER_1 = c(0,10,-30,50,0,0),
+                           ITER_2 = c(0,-20,0,50,0,0),
+                           ITER_3 = c(-20,0,0,50,0,0))
> Trans_GRP_C
  GROUP YEARMO VALUE ITER_1 ITER_2 ITER_3
1     C 201801     0      0      0    -20
2     C 201802    10     10    -20      0
3     C 201803   -30    -30      0      0
4     C 201804    50     50     50     50
5     C 201805   100      0      0      0
6     C 201806  -100      0      0      0

The logic for transfer is as follows:

  1. Replace the negative value with 0.
  2. Add the negative value at current row to the value at previous row.
  3. Transfer the negative value to the previous row until the value becomes positive or 0.
  4. Transfer until the 1st row is encountered within the group if transferring doesn't result in a positive value, here 1st row is YEARMO = 201801 in each group.

Any solution is welcome. I think a solution which is vectorized might perform faster.

like image 863
siddhesh tiwari Avatar asked Aug 26 '18 08:08

siddhesh tiwari


2 Answers

Here is another option to sum the positive part of the vector with the shifted negative part of the vector recursively until there are no more negative values left or it has been executed .N times (where .N is the number of row for each GROUP)

setDT(raw_data)[, OUTPUT := {
        posVal <- replace(VALUE, VALUE<0, 0)
        negVal <- replace(VALUE, VALUE>0, 0)
        n <- 1L
        while (any(negVal < 0) && n < .N) {
            posVal <- replace(posVal, posVal<0, 0) + 
                shift(negVal, 1L, type="lead", fill=0) +
                c(negVal[1L], rep(0, .N-1L))
            negVal <- replace(posVal, posVal>0, 0)
            n <- n + 1L
        }
        posVal
    }, by=.(GROUP)]

output:

    GROUP YEARMO VALUE OUTPUT
 1:     A 201801   100     90
 2:     A 201802   -10      0
 3:     A 201803    20     20
 4:     A 201804    70     20
 5:     A 201805   -50      0
 6:     A 201806    30     30
 7:     B 201801    20     20
 8:     B 201802    60     60
 9:     B 201803    40     10
10:     B 201804   -20      0
11:     B 201805   -10      0
12:     B 201806    50     50
13:     C 201801     0    -20
14:     C 201802    10      0
15:     C 201803   -30      0
16:     C 201804    50     50
17:     C 201805   100      0
18:     C 201806  -100      0
like image 85
chinsoon12 Avatar answered Nov 19 '22 16:11

chinsoon12


That's a tricky one. I have tried to find a vectorized solution but the only approach which worked so far was to loop backwards through the rows within each group:

library(data.table)
DT <- as.data.table(raw_data)
DT$final <- final_data$VALUE
DT[, new := {
  x <- VALUE
  sn <- 0
  for (i in .N:1) {
    if (i > 1) {
      if (x[i] < 0) {
        sn <- sn + x[i]
        x[i] <- 0
      } else {
        tmp <- pmax(x[i] + sn, 0)
        sn <- sn + x[i] - tmp
        x[i] <- tmp
      }
    } else {
      x[i] <- x[i] + sn
    }
  }
  x
}, by = GROUP]
DT[]
    GROUP YEARMO VALUE final new
 1:     A 201801   100    90  90
 2:     A 201802   -10     0   0
 3:     A 201803    20    20  20
 4:     A 201804    70    20  20
 5:     A 201805   -50     0   0
 6:     A 201806    30    30  30
 7:     B 201801    20    20  20
 8:     B 201802    60    60  60
 9:     B 201803    40    10  10
10:     B 201804   -20     0   0
11:     B 201805   -10     0   0
12:     B 201806    50    50  50
13:     C 201801     0   -20 -20
14:     C 201802    10     0   0
15:     C 201803   -30     0   0
16:     C 201804    50    50  50
17:     C 201805   100     0   0
18:     C 201806  -100     0   0

sn stores, i.e., accumulates the negative values which is then "consumed" by subsequent (in reverse order) positive values.

like image 2
Uwe Avatar answered Nov 19 '22 15:11

Uwe