I have a data frame (dives) with a number of variables including:
diveNum bottomTime
[,1] 2 FALSE
[,2] 2 FALSE
[,3] 2 TRUE
[,4] 2 TRUE
[,5] 2 FALSE
[,6] 2 TRUE
[,7] 2 FALSE
[,8] 3 FALSE
[,9] 3 TRUE
[,10] 3 FALSE
[,11] 3 TRUE
[,12] 3 TRUE
[,13] 3 FALSE
For each unique diveNum, I would like to select all rows between (& including) the first and last time that bottomTime is TRUE, giving:
diveNum bottomTime
[,3] 2 TRUE
[,4] 2 TRUE
[,5] 2 FALSE
[,6] 2 TRUE
[,9] 3 TRUE
[,10] 3 FALSE
[,11] 3 TRUE
[,12] 3 TRUE
ddply has been my friend for similar problems, and I can determine the first and last records of "TRUE" in each diveNum by first subsetting the data to only include cases where bottomTime is "TRUE" then running ddply:
dives <- dives[dives$bottomTime == "TRUE",]
bottomTime <- ddply(dives, .(diveNum), function(x) x[c(1, nrow(x)), ])
This gives:
X diveNum bottomTime
[,1] 3 2 TRUE
[,2] 6 2 TRUE
[,3] 9 3 TRUE
[,4] 12 3 TRUE
What I can't manage is to do something like use the row numbers of the first and last records of "TRUE" in each dive (stored in X) as indices to subset the original data frame. I've been struggling with this for some time any help would be greatly appreciated!
Here is an approach using data.table
library(data.table)
setDT(dives)
dives[dives[, do.call(seq,as.list(range(.I[bottomTime]))),by=diveNum][['V1']]]
# or
dives[dives[,.I[cummax(bottomTime) &rev(cummax(rev(bottomTime)))],by=diveNum][['V1']]]
# or
dives[,.SD[cummax(bottomTime) &rev(cummax(rev(bottomTime)))],by=diveNum]
#or
dives[dives[(bottomTime),seq(.I[1],.I[.N]),by=diveNum][['V1']]]
An approach using plyr::ddply
ddply(dives, .(diveNum), function(x,ind) {
x[do.call(seq, as.list(range(which(x[[ind]])))),]
} ,ind='bottomTime')
or using dplyr
dives %>% group_by(diveNum) %>%
filter(cumany(bottomTime) & rev(cumany(rev(bottomTime))))
Maybe not the most optimal way, but in base R you could use split
with do.call(rbind, ...)
> do.call(rbind, lapply(split(dives, dives$diveNum), function(x){
w <- which(x$bottomTime)
x[ w[1]:tail(w, 1), ]
}))
# diveNum bottomTime
# 2.3 2 TRUE
# 2.4 2 TRUE
# 2.5 2 FALSE
# 2.6 2 TRUE
# 3.9 3 TRUE
# 3.10 3 FALSE
# 3.11 3 TRUE
# 3.12 3 TRUE
As mentioned in the comments, a "smoother" approach would be to use by()
and avoid the two calls lapply(split(...))
> do.call(rbind, by(dives, dives$diveNum, function(x) {
w <- which(x$bottomTime)
x[ w[1]:tail(w, 1), ]
}))
I just like to make things more difficult than they actually are sometimes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With