How to select all

Question

I have a data frame (dives) with a number of variables including:

     diveNum bottomTime 
[,1]    2       FALSE
[,2]    2       FALSE
[,3]    2       TRUE
[,4]    2       TRUE
[,5]    2       FALSE
[,6]    2       TRUE
[,7]    2       FALSE
[,8]    3       FALSE
[,9]    3       TRUE
[,10]   3       FALSE
[,11]   3       TRUE
[,12]   3       TRUE
[,13]   3       FALSE

For each unique diveNum, I would like to select all rows between (& including) the first and last time that bottomTime is TRUE, giving:

     diveNum bottomTime
[,3]    2       TRUE
[,4]    2       TRUE
[,5]    2       FALSE
[,6]    2       TRUE
[,9]    3       TRUE
[,10]   3       FALSE
[,11]   3       TRUE
[,12]   3       TRUE

ddply has been my friend for similar problems, and I can determine the first and last records of "TRUE" in each diveNum by first subsetting the data to only include cases where bottomTime is "TRUE" then running ddply:

dives <- dives[dives$bottomTime == "TRUE",]
bottomTime <- ddply(dives, .(diveNum), function(x) x[c(1, nrow(x)), ])

This gives:

      X  diveNum bottomTime
[,1]  3     2      TRUE
[,2]  6     2      TRUE
[,3]  9     3      TRUE
[,4]  12    3      TRUE

What I can't manage is to do something like use the row numbers of the first and last records of "TRUE" in each dive (stored in X) as indices to subset the original data frame. I've been struggling with this for some time any help would be greatly appreciated!

mnel · Accepted Answer

Here is an approach using data.table

library(data.table)
setDT(dives)
dives[dives[, do.call(seq,as.list(range(.I[bottomTime]))),by=diveNum][['V1']]]
# or
dives[dives[,.I[cummax(bottomTime) &rev(cummax(rev(bottomTime)))],by=diveNum][['V1']]]
 # or
dives[,.SD[cummax(bottomTime) &rev(cummax(rev(bottomTime)))],by=diveNum]
 #or
dives[dives[(bottomTime),seq(.I[1],.I[.N]),by=diveNum][['V1']]]

An approach using plyr::ddply

 ddply(dives, .(diveNum), function(x,ind) {
       x[do.call(seq, as.list(range(which(x[[ind]])))),]
       } ,ind='bottomTime')

or using dplyr

dives %>% group_by(diveNum) %>% 
          filter(cumany(bottomTime) & rev(cumany(rev(bottomTime))))

Rich Scriven · Answer

Maybe not the most optimal way, but in base R you could use split with do.call(rbind, ...)

> do.call(rbind, lapply(split(dives, dives$diveNum), function(x){
      w <- which(x$bottomTime)
      x[ w[1]:tail(w, 1), ]
      }))
#      diveNum bottomTime
# 2.3        2       TRUE
# 2.4        2       TRUE
# 2.5        2      FALSE
# 2.6        2       TRUE
# 3.9        3       TRUE
# 3.10       3      FALSE
# 3.11       3       TRUE
# 3.12       3       TRUE

As mentioned in the comments, a "smoother" approach would be to use by() and avoid the two calls lapply(split(...))

> do.call(rbind, by(dives, dives$diveNum, function(x) {
      w <- which(x$bottomTime)
      x[ w[1]:tail(w, 1), ]
      }))

I just like to make things more difficult than they actually are sometimes.

How to select all

Tags:

dataframe

r

subset

plyr

user3758476

2 Answers

mnel

Rich Scriven

Recent Activity

Donate For Us

How to select all

Tags:

dataframe

r

subset

plyr

user3758476

2 Answers

mnel

Rich Scriven

Related questions

Recent Activity

Donate For Us