Say I have a vector A (or a variable A in a dataframe say df) with following values
A <- c(90L, 100L, 5L, 15L, 16L, 2L, 20L, 25L, 2L, 40L, 16L, 16L, 32L, 51L, 52L)
A
#> [1] 90 100 5 15 16 2 20 25 2 40 16 16 32 51 52
df <- data.frame(A = A)
Created on 2021-05-11 by the reprex package (v2.0.0)
Now I want to divide these values into two States say 0 and 1 based on the following criteria
0 state.80% of its previous value (e.g. in third row from 100 to 5 i.e. 95% drop), its state changes to 1 from 0.1 state until it rises 50% of that value (100) i.e. 50 again. It rises above 50 in 14th row. So value in 14th row will be state 0 again0.So basically I am trying to get an output like this
A State
1 90 0
2 100 0
3 5 1
4 15 1
5 16 1
6 2 1
7 20 1
8 25 1
9 2 1
10 40 1
11 16 1
12 16 1
13 32 1
14 51 0
15 52 0
BaseR or tidyverse approach will be doing fine for me.
Where I am stuck is actually retrieving the threshold value (100 or 50%) till 14th row as you can see it further drops by 80% again two times, once in row 6th and again row 9th.
One more test case can be
A State
1 90 0
2 100 0
3 5 1
4 15 1
5 16 1
6 2 1
7 20 1
8 25 1
9 2 1
10 40 1
11 16 1
12 16 1
13 32 1
14 51 0
15 52 0
16 60 0
17 10 1
18 20 1
19 5 1
20 30 1
21 31 0
22 50 0
23 100 0
Explanation

benchmarking both answers on df <- data.frame(A = sample(1:100, 100000, T))
Unit: microseconds
expr min lq mean median uq max neval
BlueVoxe() 214547.5 271455.60 298551.084 300763.90 309823.3 499692.5 100
Ben() 4.2 4.85 7.115 5.35 9.8 11.3 100
Anil, perhaps this might be something to help in moving forward. It isn't elegant. You can create a loop checking for changes in values, while tracking prior thresholds.
A <- c(90, 100, 5, 15, 16, 2, 20, 25, 2, 40, 16, 16, 32,
51, 52, 60, 10, 20, 5, 30, 31, 50, 100)
threshold <- NA
State <- 0
for (i in 2:length(A)) {
if (State[i-1] == 0) {
if ((A[i-1] - A[i]) > (.8 * A[i-1])) {
threshold <- A[i-1]
State <- c(State, 1)
} else {
State <- c(State, 0)
}
} else {
if (A[i] > (.5 * threshold)) {
State <- c(State, 0)
} else {
State <- c(State, 1)
}
}
}
data.frame(A, State)
Output
A State
1 90 0
2 100 0
3 5 1
4 15 1
5 16 1
6 2 1
7 20 1
8 25 1
9 2 1
10 40 1
11 16 1
12 16 1
13 32 1
14 51 0
15 52 0
16 60 0
17 10 1
18 20 1
19 5 1
20 30 1
21 31 0
22 50 0
23 100 0
Data
A <- c(90, 100, 5, 15, 16, 2, 20, 25, 2, 40, 16, 16, 32, 51, 52, 60,
10, 20, 5, 30, 31, 50, 100)
Here is a function using purrr::accumulate() that should do the trick:
library(dplyr)
library(purrr)
A <- c(90L, 100L, 5L, 15L, 16L, 2L, 20L, 25L, 2L, 40L, 16L, 16L, 32L, 51L, 52L)
df <- data.frame(A = A)
trans <- function(x, trigger_0 = 0.5, trigger_1 = -0.8) {
# This variable 'remembers' the last value that switched
# the indicator to 1
compare <- 0
out <- purrr::accumulate(seq_along(x), .init = 0, function(prev_result, i) {
# Initial value should be 0 - this will return .init
if (i == 1) {
return(0)
}
# If previous result is 0 we only need to check that change from
# prev value is less than `trigger_1`
if (prev_result == 0) {
# Compute the change from previous value
change1 <- (x[i] - x[i - 1]) / x[i - 1]
if (change1 < trigger_1) {
# Reset 'compare' to be used in next iterations
compare <<- x[i - 1]
return(1)
} else return(0)
}
if (prev_result == 1) {
# Compute change from 'compare'
change2 <- x[i] / compare
# Return 1 or 0 based on the increase/decrease from 'compare'
if (change2 > trigger_0) {
return(0)
} else return(1)
}
})
# Remove the leading 0 created by using `init`
tail(out, -1)
}
df %>%
mutate(Indicator = trans(A))
#> A Indicator
#> 1 90 0
#> 2 100 0
#> 3 5 1
#> 4 15 1
#> 5 16 1
#> 6 2 1
#> 7 20 1
#> 8 25 1
#> 9 2 1
#> 10 40 1
#> 11 16 1
#> 12 16 1
#> 13 32 1
#> 14 51 0
#> 15 52 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With