Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset data frame based on vector sequence of minimum 5 consecutive values

I have a vector that looks like this:

out1[1:200]
  [1] NA NA NA NA  0  1  2 NA NA NA  1 NA  0 NA  0  1 NA NA  0 NA  0  1  2  2  2 NA  0  1  2  3  4  4  5  6  7  8  9  9  9  9
 [41] 10 11 NA  0  0 NA  1 NA  0  1 NA  0 NA  0  1  2 NA  1 NA  0  0  0  1  2 NA NA NA  0  0 NA  0  0  0  1  2 NA  1  2 NA  0
 [81]  1  2  3  4  5  6  7  8 NA  0  1  2  3  4 NA  0  1  2  2  3  4  5 NA  0  1  2  3  3  4  5  5  6  7 NA  1  2 NA  1  2 NA
[121]  0  1  2 NA  1  2  3  3  3  3  4 NA  0  0  0  1  2  3  4  5 NA NA  0  1 NA NA NA  1  2  2  3 NA  1  2  2  2 NA NA  0  1
[161] NA  1 NA  1  2 NA  0  0 NA NA  0  1 NA NA NA NA  1  2  3 NA NA  1  2  3  4  5  6 NA  1  2  3  4  5  6  6  7  8 NA  0  1

I now want to subset a df (with the same length) by this vector, but only sequences that have a range over minimum 5 consecutive numbers, e.g. 0:4, or 1:5 (and of course everything longer than this). Hence, NA's should be FALSE as well.

E.g.

out1: NA NA 0 1 2 2 NA 0 0 1 2 3 3 4 NA 

Then the result should be

out2: F F F F F F F T T T T T T T F
like image 956
Pat Avatar asked Apr 16 '15 08:04

Pat


People also ask

How do you subset vectors?

The way you tell R that you want to select some particular elements (i.e., a 'subset') from a vector is by placing an 'index vector' in square brackets immediately following the name of the vector. For a simple example, try x[1:10] to view the first ten elements of x.

How do I subset a Dataframe based on a column in R?

The most general way to subset a data frame by rows and/or columns is the base R Extract[] function, indicated by matched square brackets instead of the usual matched parentheses. For a data frame named d the general format is d[rows, columms] .


1 Answers

Following gives the desired result

library(data.table) # v >= 1.9.5 (devel version - install from GitHub)
data.table(x)[,id:=rleid(!is.na(x)),
   ][ , aa:=(.N>5) , by = id
      ][ ,aaa:=4 %in% cumsum(diff(unique(sort(x)))), by = .(id, aa)
         ]$aaa

## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
## [15]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

data

x <- c(NA, NA, NA, NA, NA, 0, 1, 2, NA, 0, 1, 2, 3, 4, 4, 5, NA, 1, 2, 3, 3, 3, 3, 4, NA)
like image 90
Khashaa Avatar answered Sep 22 '22 00:09

Khashaa