Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select all

I have a data frame (dives) with a number of variables including:

     diveNum bottomTime 
[,1]    2       FALSE
[,2]    2       FALSE
[,3]    2       TRUE
[,4]    2       TRUE
[,5]    2       FALSE
[,6]    2       TRUE
[,7]    2       FALSE
[,8]    3       FALSE
[,9]    3       TRUE
[,10]   3       FALSE
[,11]   3       TRUE
[,12]   3       TRUE
[,13]   3       FALSE

For each unique diveNum, I would like to select all rows between (& including) the first and last time that bottomTime is TRUE, giving:

     diveNum bottomTime
[,3]    2       TRUE
[,4]    2       TRUE
[,5]    2       FALSE
[,6]    2       TRUE
[,9]    3       TRUE
[,10]   3       FALSE
[,11]   3       TRUE
[,12]   3       TRUE

ddply has been my friend for similar problems, and I can determine the first and last records of "TRUE" in each diveNum by first subsetting the data to only include cases where bottomTime is "TRUE" then running ddply:

dives <- dives[dives$bottomTime == "TRUE",]
bottomTime <- ddply(dives, .(diveNum), function(x) x[c(1, nrow(x)), ])

This gives:

      X  diveNum bottomTime
[,1]  3     2      TRUE
[,2]  6     2      TRUE
[,3]  9     3      TRUE
[,4]  12    3      TRUE

What I can't manage is to do something like use the row numbers of the first and last records of "TRUE" in each dive (stored in X) as indices to subset the original data frame. I've been struggling with this for some time any help would be greatly appreciated!

like image 736
user3758476 Avatar asked Jun 20 '14 02:06

user3758476


2 Answers

Here is an approach using data.table

library(data.table)
setDT(dives)
dives[dives[, do.call(seq,as.list(range(.I[bottomTime]))),by=diveNum][['V1']]]
# or
dives[dives[,.I[cummax(bottomTime) &rev(cummax(rev(bottomTime)))],by=diveNum][['V1']]]
 # or
dives[,.SD[cummax(bottomTime) &rev(cummax(rev(bottomTime)))],by=diveNum]
 #or
dives[dives[(bottomTime),seq(.I[1],.I[.N]),by=diveNum][['V1']]]

An approach using plyr::ddply

 ddply(dives, .(diveNum), function(x,ind) {
       x[do.call(seq, as.list(range(which(x[[ind]])))),]
       } ,ind='bottomTime')

or using dplyr

dives %>% group_by(diveNum) %>% 
          filter(cumany(bottomTime) & rev(cumany(rev(bottomTime))))
like image 177
mnel Avatar answered Sep 19 '22 23:09

mnel


Maybe not the most optimal way, but in base R you could use split with do.call(rbind, ...)

> do.call(rbind, lapply(split(dives, dives$diveNum), function(x){
      w <- which(x$bottomTime)
      x[ w[1]:tail(w, 1), ]
      }))
#      diveNum bottomTime
# 2.3        2       TRUE
# 2.4        2       TRUE
# 2.5        2      FALSE
# 2.6        2       TRUE
# 3.9        3       TRUE
# 3.10       3      FALSE
# 3.11       3       TRUE
# 3.12       3       TRUE

As mentioned in the comments, a "smoother" approach would be to use by() and avoid the two calls lapply(split(...))

> do.call(rbind, by(dives, dives$diveNum, function(x) {
      w <- which(x$bottomTime)
      x[ w[1]:tail(w, 1), ]
      }))

I just like to make things more difficult than they actually are sometimes.

like image 40
Rich Scriven Avatar answered Sep 18 '22 23:09

Rich Scriven