Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to segregate a vector/variable in two states depending upon percent increase of decrease from its previous value

Say I have a vector A (or a variable A in a dataframe say df) with following values

A <- c(90L, 100L, 5L, 15L, 16L, 2L, 20L, 25L, 2L, 40L, 16L, 16L, 32L, 51L, 52L)

A
#>  [1]  90 100   5  15  16   2  20  25   2  40  16  16  32  51  52

df <- data.frame(A = A)

Created on 2021-05-11 by the reprex package (v2.0.0)

Now I want to divide these values into two States say 0 and 1 based on the following criteria

  • By default first value will be 0 state.
  • If the value has dropped by more than 80% of its previous value (e.g. in third row from 100 to 5 i.e. 95% drop), its state changes to 1 from 0.
  • Here the difficult part arrives. All values following this dropped value will remain in 1 state until it rises 50% of that value (100) i.e. 50 again. It rises above 50 in 14th row. So value in 14th row will be state 0 again
  • Since last value is not a decrease by more than 80% of previous value i.e. 51 it will be same state i.e. 0.

So basically I am trying to get an output like this

     A State
1   90     0
2  100     0
3    5     1
4   15     1
5   16     1
6    2     1
7   20     1
8   25     1
9    2     1
10  40     1
11  16     1
12  16     1
13  32     1
14  51     0
15  52     0

BaseR or tidyverse approach will be doing fine for me.

Where I am stuck is actually retrieving the threshold value (100 or 50%) till 14th row as you can see it further drops by 80% again two times, once in row 6th and again row 9th.

One more test case can be

     A State
1   90     0
2  100     0
3    5     1
4   15     1
5   16     1
6    2     1
7   20     1
8   25     1
9    2     1
10  40     1
11  16     1
12  16     1
13  32     1
14  51     0
15  52     0
16  60     0
17  10     1
18  20     1
19   5     1
20  30     1
21  31     0
22  50     0
23 100     0

Explanation

enter image description here


benchmarking both answers on df <- data.frame(A = sample(1:100, 100000, T))

Unit: microseconds
       expr      min        lq       mean    median       uq      max neval
 BlueVoxe() 214547.5 271455.60 298551.084 300763.90 309823.3 499692.5   100
      Ben()      4.2      4.85      7.115      5.35      9.8     11.3   100
like image 672
AnilGoyal Avatar asked Feb 21 '26 15:02

AnilGoyal


2 Answers

Anil, perhaps this might be something to help in moving forward. It isn't elegant. You can create a loop checking for changes in values, while tracking prior thresholds.

A <- c(90, 100, 5, 15, 16, 2, 20, 25, 2, 40, 16, 16, 32, 
       51, 52, 60, 10, 20, 5, 30, 31, 50, 100)

threshold <- NA
State <- 0

for (i in 2:length(A)) {
  if (State[i-1] == 0) {
    if ((A[i-1] - A[i]) > (.8 * A[i-1])) {
      threshold <- A[i-1]
      State <- c(State, 1)
    } else {
      State <- c(State, 0)
    }
  } else {
    if (A[i] > (.5 * threshold)) { 
      State <- c(State, 0)
    } else {
      State <- c(State, 1)
    }
  }
}

data.frame(A, State)

Output

     A State
1   90     0
2  100     0
3    5     1
4   15     1
5   16     1
6    2     1
7   20     1
8   25     1
9    2     1
10  40     1
11  16     1
12  16     1
13  32     1
14  51     0
15  52     0
16  60     0
17  10     1
18  20     1
19   5     1
20  30     1
21  31     0
22  50     0
23 100     0

Data

A <- c(90, 100, 5, 15, 16, 2, 20, 25, 2, 40, 16, 16, 32, 51, 52, 60, 
10, 20, 5, 30, 31, 50, 100)
like image 101
Ben Avatar answered Feb 23 '26 10:02

Ben


Here is a function using purrr::accumulate() that should do the trick:

library(dplyr)
library(purrr)

A <- c(90L, 100L, 5L, 15L, 16L, 2L, 20L, 25L, 2L, 40L, 16L, 16L, 32L, 51L, 52L)
df <- data.frame(A = A)

trans <- function(x, trigger_0 = 0.5, trigger_1 = -0.8) {
  
  # This variable 'remembers' the last value that switched 
  # the indicator to 1
  compare <- 0
  
  out <- purrr::accumulate(seq_along(x), .init = 0, function(prev_result, i) {
    
    # Initial value should be 0 - this will return .init
    if (i == 1) {
      return(0)
    }
    
    # If previous result is 0 we only need to check that change from
    # prev value is less than `trigger_1`
    if (prev_result == 0) {
      
      # Compute the change from previous value
      change1 <- (x[i] - x[i - 1]) / x[i - 1]
      
      if (change1 < trigger_1) {
        
        # Reset 'compare' to be used in next iterations
        compare <<- x[i - 1]
        return(1)
        
      } else return(0)
      
    }
    
    if (prev_result == 1) {
      
      # Compute change from 'compare'
      change2 <- x[i] / compare
      
      # Return 1 or 0 based on the increase/decrease from 'compare'
      if (change2 > trigger_0) {
        return(0)
      } else return(1)
    }
    
  })
  
  # Remove the leading 0 created by using `init`
  tail(out, -1)
   
}

df %>% 
  mutate(Indicator = trans(A))
#>      A Indicator
#> 1   90         0
#> 2  100         0
#> 3    5         1
#> 4   15         1
#> 5   16         1
#> 6    2         1
#> 7   20         1
#> 8   25         1
#> 9    2         1
#> 10  40         1
#> 11  16         1
#> 12  16         1
#> 13  32         1
#> 14  51         0
#> 15  52         0
like image 38
wurli Avatar answered Feb 23 '26 09:02

wurli