For each value of myvector1
I would like to know the mycategory
value of the immediate previous same value in myvector1
, given that mystatus
is ON, otherwise I would look for the corresponding next same value until it is ON.
Instructions would look like this:
Given the dataset mydf
What I am looking for is DesiredSolution
(which I filled in manually).
mydf <- structure(list(myvector1 = structure(c(1L, 2L, 3L, 4L, 5L, 1L,
2L, 4L, 5L, 2L, 3L, 4L, 5L, 2L, 3L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 4L, 5L, 1L, 1L, 2L, 3L, 4L, 5L, 3L), .Label = c("0", "1",
"2", "3", "4"), class = "factor"), mystatus = structure(c(2L,
1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L,
1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("OFF",
"ON"), class = "factor"), mycategory = structure(c(2L, 2L, 3L,
1L, 1L, 1L, 1L, 3L, 3L, 1L, 2L, 2L, 3L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), .Label = c("bye",
"hi", "stay"), class = "factor"), DesiredSolution = structure(c(3L,
3L, 3L, 3L, 3L, 2L, 3L, 1L, 3L, 1L, 4L, 4L, 4L, 1L, 2L, 4L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L), .Label = c("bye",
"hi", "NA", "stay"), class = "factor")), .Names = c("myvector1",
"mystatus", "mycategory", "DesiredSolution"), row.names = c(NA,
-32L), class = "data.frame")
With data.table...
library(data.table)
setDT(mydf)
mydf[, r := .I]
mydf[, v := mydf[mystatus == "ON"][mydf, on=.(r < r, myvector1), mult="last", x.mycategory]]
which gives
myvector1 mystatus mycategory DesiredSolution r v
1: 0 ON hi NA 1 NA
2: 1 OFF hi NA 2 NA
3: 2 ON stay NA 3 NA
4: 3 ON bye NA 4 NA
5: 4 OFF bye NA 5 NA
6: 0 ON bye hi 6 hi
7: 1 ON bye NA 7 NA
8: 3 ON stay bye 8 bye
9: 4 ON stay NA 9 NA
10: 1 OFF bye bye 10 bye
11: 2 ON hi stay 11 stay
12: 3 ON hi stay 12 stay
13: 4 ON stay stay 13 stay
14: 1 OFF bye bye 14 bye
15: 2 ON bye hi 15 hi
16: 4 ON bye stay 16 stay
17: 0 ON bye bye 17 bye
18: 1 OFF bye bye 18 bye
19: 2 ON hi bye 19 bye
20: 3 ON hi hi 20 hi
21: 4 OFF stay bye 21 bye
22: 0 OFF bye bye 22 bye
23: 1 ON bye bye 23 bye
24: 3 OFF bye hi 24 hi
25: 4 ON bye bye 25 bye
26: 0 OFF bye bye 26 bye
27: 0 OFF hi bye 27 bye
28: 1 OFF hi bye 28 bye
29: 2 OFF hi hi 29 hi
30: 3 OFF hi hi 30 hi
31: 4 OFF stay bye 31 bye
32: 2 ON stay hi 32 hi
myvector1 mystatus mycategory DesiredSolution r v
How it works: Look up rows in mydf[mystatus == "ON"]
where row number r
is lower and myvector1
matches. Return mycategory
, taking the last matching row if there are multiple matches.
Another possible way, working with "slice" of your data.frame
and zoo::na.locf
First, set the position of your data so you can get it back afterwards
mydf$pos <- seq_len(nrow(mydf))
Then split
your data.frame according to myvector1
:
spl_mydf <- split(mydf, mydf$myvector1)
Then apply to each "slice" (only one value of myvector1
) a function which keep only the values of mycategory
when mystatus
is ON, put the rest as NA
and replace NA
with previous not NA
value. The first item is NA
and you don't keep the last item (in order to get the desired lag).
my_out <- lapply(spl_mydf,
function(sl_df){
out <- sl_df$mycategory
out[sl_df$mystatus=="OFF"] <- NA
data.frame(pos=sl_df$pos,
out=c(NA, head(na.locf(as.character(out), na.rm=FALSE), -1))) # as.character is to avoid getting the factors levels
})
Finally, put back everything according to position and suppress the pos column:
out <- do.call(rbind, my_out)
mydf$output <- out$out[order(out$pos)]
mydf$pos <- NULL
head(mydf, 10)
myvector1 mystatus mycategory DesiredSolution output
1 0 ON hi NA <NA>
2 1 OFF hi NA <NA>
3 2 ON stay NA <NA>
4 3 ON bye NA <NA>
5 4 OFF bye NA <NA>
6 0 ON bye hi hi
7 1 ON bye NA <NA>
8 3 ON stay bye bye
9 4 ON stay NA <NA>
10 1 OFF bye bye bye
check everything is alright:
all(mydf$DesiredSolution==mydf$output, na.rm=TRUE) # TRUE
all((as.character(mydf$DesiredSolution)=="NA")==is.na(mydf$output)) # TRUE
(NA is considered as one of the levels in your data.frame)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With