Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find previous same value in a vector and apply certain conditions

Tags:

for-loop

r

For each value of myvector1 I would like to know the mycategory value of the immediate previous same value in myvector1, given that mystatus is ON, otherwise I would look for the corresponding next same value until it is ON.

Instructions would look like this:

  1. For a given position of "myvector" give me his value.
  2. Look for the position of the inmediate previous same value in "myvector"
  3. Check the asssociated status. If it is ON give me his associated "mycategory". If it is OFF, repeat go to point 2.
  4. Assign the obtained "mycategory" to a new vector "mysolution".

Given the dataset mydf What I am looking for is DesiredSolution (which I filled in manually).

mydf <- structure(list(myvector1 = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 
2L, 4L, 5L, 2L, 3L, 4L, 5L, 2L, 3L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 
2L, 4L, 5L, 1L, 1L, 2L, 3L, 4L, 5L, 3L), .Label = c("0", "1", 
"2", "3", "4"), class = "factor"), mystatus = structure(c(2L, 
1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 
1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("OFF", 
"ON"), class = "factor"), mycategory = structure(c(2L, 2L, 3L, 
1L, 1L, 1L, 1L, 3L, 3L, 1L, 2L, 2L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), .Label = c("bye", 
"hi", "stay"), class = "factor"), DesiredSolution = structure(c(3L, 
3L, 3L, 3L, 3L, 2L, 3L, 1L, 3L, 1L, 4L, 4L, 4L, 1L, 2L, 4L, 1L, 
1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L), .Label = c("bye", 
"hi", "NA", "stay"), class = "factor")), .Names = c("myvector1", 
"mystatus", "mycategory", "DesiredSolution"), row.names = c(NA, 
-32L), class = "data.frame")
like image 306
alvaropr Avatar asked Dec 24 '22 11:12

alvaropr


2 Answers

With data.table...

library(data.table)
setDT(mydf)
mydf[, r := .I]
mydf[, v := mydf[mystatus == "ON"][mydf, on=.(r < r, myvector1), mult="last", x.mycategory]]

which gives

    myvector1 mystatus mycategory DesiredSolution  r    v
 1:         0       ON         hi              NA  1   NA
 2:         1      OFF         hi              NA  2   NA
 3:         2       ON       stay              NA  3   NA
 4:         3       ON        bye              NA  4   NA
 5:         4      OFF        bye              NA  5   NA
 6:         0       ON        bye              hi  6   hi
 7:         1       ON        bye              NA  7   NA
 8:         3       ON       stay             bye  8  bye
 9:         4       ON       stay              NA  9   NA
10:         1      OFF        bye             bye 10  bye
11:         2       ON         hi            stay 11 stay
12:         3       ON         hi            stay 12 stay
13:         4       ON       stay            stay 13 stay
14:         1      OFF        bye             bye 14  bye
15:         2       ON        bye              hi 15   hi
16:         4       ON        bye            stay 16 stay
17:         0       ON        bye             bye 17  bye
18:         1      OFF        bye             bye 18  bye
19:         2       ON         hi             bye 19  bye
20:         3       ON         hi              hi 20   hi
21:         4      OFF       stay             bye 21  bye
22:         0      OFF        bye             bye 22  bye
23:         1       ON        bye             bye 23  bye
24:         3      OFF        bye              hi 24   hi
25:         4       ON        bye             bye 25  bye
26:         0      OFF        bye             bye 26  bye
27:         0      OFF         hi             bye 27  bye
28:         1      OFF         hi             bye 28  bye
29:         2      OFF         hi              hi 29   hi
30:         3      OFF         hi              hi 30   hi
31:         4      OFF       stay             bye 31  bye
32:         2       ON       stay              hi 32   hi
    myvector1 mystatus mycategory DesiredSolution  r    v

How it works: Look up rows in mydf[mystatus == "ON"] where row number r is lower and myvector1 matches. Return mycategory, taking the last matching row if there are multiple matches.

like image 193
Frank Avatar answered Apr 20 '23 00:04

Frank


Another possible way, working with "slice" of your data.frame and zoo::na.locf

First, set the position of your data so you can get it back afterwards

mydf$pos <- seq_len(nrow(mydf))

Then split your data.frame according to myvector1:

spl_mydf <- split(mydf, mydf$myvector1)

Then apply to each "slice" (only one value of myvector1) a function which keep only the values of mycategory when mystatus is ON, put the rest as NA and replace NA with previous not NA value. The first item is NA and you don't keep the last item (in order to get the desired lag).

my_out <- lapply(spl_mydf,
                 function(sl_df){
                    out <- sl_df$mycategory
                    out[sl_df$mystatus=="OFF"] <- NA
                    data.frame(pos=sl_df$pos, 
                               out=c(NA, head(na.locf(as.character(out), na.rm=FALSE), -1))) # as.character is to avoid getting the factors levels
 })

Finally, put back everything according to position and suppress the pos column:

out <- do.call(rbind, my_out)
mydf$output <- out$out[order(out$pos)]
mydf$pos <- NULL

head(mydf, 10)
   myvector1 mystatus mycategory DesiredSolution output
1          0       ON         hi              NA   <NA>
2          1      OFF         hi              NA   <NA>
3          2       ON       stay              NA   <NA>
4          3       ON        bye              NA   <NA>
5          4      OFF        bye              NA   <NA>
6          0       ON        bye              hi     hi
7          1       ON        bye              NA   <NA>
8          3       ON       stay             bye    bye
9          4       ON       stay              NA   <NA>
10         1      OFF        bye             bye    bye

check everything is alright:

all(mydf$DesiredSolution==mydf$output, na.rm=TRUE) # TRUE
all((as.character(mydf$DesiredSolution)=="NA")==is.na(mydf$output)) # TRUE 

(NA is considered as one of the levels in your data.frame)

like image 23
Cath Avatar answered Apr 20 '23 01:04

Cath