Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find position of first value greater than X in a vector

I have a vector and want to find the position of the first value that is greater than 100.

like image 498
curbholes Avatar asked Apr 01 '15 10:04

curbholes


6 Answers

# Randomly generate a suitable vector
set.seed(0)
v <- sample(50:150, size = 50, replace = TRUE)

min(which(v > 100))
like image 89
Phil Avatar answered Oct 19 '22 19:10

Phil


Most answers based on which and max are slow (especially for long vectors) as they iterate through the entire vector:

  1. x>100 evaluates every value in the vector to see if it matches the condition
  2. which and max/min search all the indexes returned at step 1. and find the maximum/minimum

Position will only evaluate the condition until it encounters the first TRUE value and immediately return the corresponding index, without continuing through the rest of the vector.

# Randomly generate a suitable vector
v <- sample(50:150, size = 50, replace = TRUE)

Position(function(x) x > 100, v)
like image 25
Ant Avatar answered Oct 19 '22 21:10

Ant


Check out which.max:

x <- seq(1, 150, 3)
which.max(x > 100)
# [1] 35
x[35]
# [1] 103
like image 16
lukeA Avatar answered Oct 19 '22 19:10

lukeA


Just to mention, Hadley Wickham has implemented a function, detect_index, to do exactly this task in his purrr package for functional programming.

I recently used detect_index myself and would recommend it to anyone else with the same problem.

Documentation for detect_index can be found here: https://rdrr.io/cran/purrr/man/detect.html

like image 6
Chill2Macht Avatar answered Oct 19 '22 20:10

Chill2Macht


As I need to perform a similar calculation many times within a loop, I was interested in which of the many answers provided in this thread would be most efficient.

TLDR: Whether the first value appears early or late in a vector, which.max(v > 100) is the fastest solution to this problem.

Note, however, that if no entry in v exceeds 100, it will return 1; thus there may be cause for

SafeWhichMax <- function (v) {
  first <- which.max(v > 100)
  if (first == 1L && v[1] <= 100) NA else first
}
SafeWhichMax(100) # NA
SafeWhichMax(101) # 1

If a vector is very long and is not guaranteed to contain any TRUE results, match(TRUE, v > 100) may be quicker than which.max() with checks.

# Short vector:
v <- 0:105

microbenchmark(
  which.max(v > 100),
  match(TRUE, v > 100),
  min(which(v > 100)),
  which(v > 100)[1],
  Position(function(x) x, v > 100),
  Position(function(x) x > 100, v),
  purrr::detect_index(v, function (x) x > 100)
)
Unit: microseconds
                                  mean      median
which.max(v > 100)                24.112    23.80
SafeWhichMax(v)                   24.889    24.25
match(TRUE, v > 100)              34.752    33.20
min(which(v > 100))               25.506    25.20
which(v > 100)[1]                 25.320    24.90
Position(function(x) x, v > 100)  3231.783  3043.50
Position(function(x) x > 100, v)  3487.805  3314.75
purrr::detect_index               16436.579 16064.90
# Long vector, with late first occurrence of v > 100
v <- -10000:105
Unit: microseconds
                                  mean   median
which.max(v > 100)               24.958    24.30
SafeWhichMax(v)                  25.456    24.90
match(TRUE, v > 100)             37.680    37.85
min(which(v > 100))              26.439    26.00
which(v > 100)[1]                25.724    25.55
Position(function(x) x, v > 100) 3224.240  3036.50
Position(function(x) x > 100, v) 3389.538  3287.05
purrr::detect_index              17344.706 15283.35
like image 2
Martin Smith Avatar answered Oct 19 '22 20:10

Martin Smith


There are many solutions, another is:

x <- 90:110
which(x > 100)[1]
like image 1
Jeff Avatar answered Oct 19 '22 21:10

Jeff