I'd be surprised if this isn't a dup, but I couldn't find a solution. I understand the limitations of <code>==</code> for testing equality of floating-point numbers. One should use <code>all.equal</code> <pre class="prettyprint"><code>0.1 + 0.2 == 0.3 # FALSE all.equal(0.1 + 0.2, 0.3) # TRUE </code></pre> But <code>==</code> has the advantage of being vectorized: <pre class="prettyprint"><code>set.seed(1) Df <- data.frame(x = sample(seq(-1, 1, by = 0.1), size = 100, replace = TRUE), y = 0.1) Df[Df$x > 0 & Df$x < 0.2,] ## x y ## 44 0.1 0.1 ## 45 0.1 0.1 # yet sum(Df$x == Df$y) # [1] 0 </code></pre> I can write a (bad) function myself: <pre class="prettyprint"><code>All.Equal <- function(x, y){ stopifnot(length(x) == length(y)) out <- logical(length(x)) for (i in seq_along(x)){ out[i] <- isTRUE(all.equal(x[i], y[i])) } out } sum(All.Equal(Df$x, Df$y)) </code></pre> which gives the correct answer, but still has a long way to go. <pre class="prettyprint"><code>microbenchmark::microbenchmark(All.Equal(Df$x, Df$y), Df$x == Df$y) Unit: microseconds expr min lq mean median uq max neval cld All.Equal(Df$x, Df$y) 9954.986 10298.127 20382.24436 10511.5360 10798.841 915182.911 100 b Df$x == Df$y 16.857 19.265 29.06261 30.8535 38.529 45.151 100 a </code></pre> Another option might be: <pre class="prettyprint"><code>All.equal.abs <- function(x,y){ tol <- .Machine$double.eps ^ 0.5 abs(x - y) < tol } </code></pre> which performs comparably to <code>==</code>. What is an existing function that performs this task?

<code>Vectorize()</code> turns out to be a slow option. As @fishtank suggests in the comment, the best solution comes from checking if the absolute difference is smaller than some tolerance value, i.e. <code>is_equal_tol()</code> from below. <pre class="prettyprint"><code>set.seed(123) a <- sample(1:10, size = 50, replace = T) b <- sample(a) is_equal_tol <- function(x, y, tol = .Machine$double.eps ^ 0.5) { abs(x - y) < tol } is_equal_vec <- Vectorize(all.equal, c("target", "current")) is_equal_eq <- function(x, y) x == y microbenchmark::microbenchmark(is_equal_eq(a, b), is_equal_tol(a, b), isTRUE(is_equal_vec(a, b)), times = 1000L) Unit: nanoseconds expr min lq mean median uq max neval is_equal_eq(a, b) 0 856 1545.797 1284 2139 14113 1000 is_equal_tol(a, b) 1711 2567 4991.377 4278 6843 27370 1000 isTRUE(is_equal_vec(a, b)) 2858445 3008552 3258916.503 3082964 3204204 46130260 1000 </code></pre>

Vectorized equality testing

Tags:

equality

r

I'd be surprised if this isn't a dup, but I couldn't find a solution.

I understand the limitations of == for testing equality of floating-point numbers. One should use all.equal

0.1 + 0.2 == 0.3
# FALSE
all.equal(0.1 + 0.2, 0.3)
# TRUE

But == has the advantage of being vectorized:

set.seed(1)
Df <- data.frame(x = sample(seq(-1, 1, by = 0.1), size = 100, replace = TRUE),
                 y = 0.1)
Df[Df$x > 0 & Df$x < 0.2,]
## x   y
## 44 0.1 0.1
## 45 0.1 0.1

# yet
sum(Df$x == Df$y)
# [1] 0

I can write a (bad) function myself:

All.Equal <- function(x, y){
  stopifnot(length(x) == length(y))
  out <- logical(length(x))
  for (i in seq_along(x)){
    out[i] <- isTRUE(all.equal(x[i], y[i]))
  }
  out
}

sum(All.Equal(Df$x, Df$y))

which gives the correct answer, but still has a long way to go.

microbenchmark::microbenchmark(All.Equal(Df$x, Df$y), Df$x == Df$y)
Unit: microseconds
                  expr      min        lq        mean     median        uq        max neval cld
 All.Equal(Df$x, Df$y) 9954.986 10298.127 20382.24436 10511.5360 10798.841 915182.911   100   b
          Df$x == Df$y   16.857    19.265    29.06261    30.8535    38.529     45.151   100  a

Another option might be:

All.equal.abs <- function(x,y){
  tol <- .Machine$double.eps ^ 0.5
  abs(x - y) < tol
}

which performs comparably to ==.

What is an existing function that performs this task?

700

asked Jan 30 '16 03:01

Hugh

1 Answers

Vectorize() turns out to be a slow option. As @fishtank suggests in the comment, the best solution comes from checking if the absolute difference is smaller than some tolerance value, i.e. is_equal_tol() from below.

set.seed(123)
a <- sample(1:10, size = 50, replace = T)
b <- sample(a)

is_equal_tol <- function(x, y, tol = .Machine$double.eps ^ 0.5) {
  abs(x - y) < tol
}

is_equal_vec <- Vectorize(all.equal, c("target", "current"))

is_equal_eq <- function(x, y) x == y

microbenchmark::microbenchmark(is_equal_eq(a, b),
                               is_equal_tol(a, b), 
                               isTRUE(is_equal_vec(a, b)),
                               times = 1000L)

Unit: nanoseconds
                       expr     min      lq        mean  median      uq      max neval
          is_equal_eq(a, b)       0     856    1545.797    1284    2139    14113  1000
         is_equal_tol(a, b)    1711    2567    4991.377    4278    6843    27370  1000
 isTRUE(is_equal_vec(a, b)) 2858445 3008552 3258916.503 3082964 3204204 46130260  1000

111

answered Nov 12 '22 06:11

Johan Larsson

Related questions
                            
                                ggplot2: merge legends for geom_line, geom_point, and geom_bar
                            
                                Creating a soft symbolic link from R on Windows
                            
                                exclusions with '-' when using string versions (underscore suffix such as gather_()) of dplyr/tidyr functions
                            
                                Can't install the caret package in R (in my Linux machine)
                            
                                Why does data.table, setting "with" = F, output a data.table when j is a single column?
                            
                                Multiple styles() applied to the same row with package openxlsx
                            
                                how to install a R package from github manually or offline
                            
                                R Shiny : non-reactive text output
                            
                                how to add two regression line equations and R2s with each facet?
                            
                                Extracting Code of R function to be used in knitr with controlled width
                            
                                R shiny progress bar for pblapply functions
                            
                                Adding PATH to RStudio’s path
                            
                                Combining choropleth made in ggplot and ggmap
                            
                                Error: Package "ggplot2" could not be found, when loading the caret package
                            
                                Modifying an R factor?
                            
                                Rmarkdown Error: "! Paragraph ended before \@fileswith@ptions was complete"
                            
                                R: googlesheets/gs_upload: Upload to a specific folder
                            
                                HTML widgets in Jupyter R Notebook
                            
                                shinydashboard Sidebar Menu Overflow
                            
                                Duplicated legends when faceting in ggplotly

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With