Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the R match function so slow?

Tags:

r

The following should find the location of the first instance of the integer 1:

array <- rep(1,10000000)
system.time(match(1,array))

This returns

   user  system elapsed
  0.720   1.243   1.964

If I run the same task using an array of size 100 I get this:

   user  system elapsed
      0       0       0

Since all it should be doing is looking at the first value in the array and returning a match, the time taken should be that of a lookup and a comparison, regardless of the size of the array. If I wrote this in lower-level language it would cost in the order of a handful of clock cycles (a microsecond or less?) regardless of the array size. Why does it take a second in R? It seems to be iterating through he whole array...

Is there a way for it to abort once it has found its match, rather than continuing to iterate unnecessarily?

like image 432
quant Avatar asked Jul 24 '14 00:07

quant


People also ask

Why is my R code taking so long?

There is a lot of overhead in the processing because R needs to check the type of a variable nearly every time it looks at it. This makes it easy to change types and reuse variable names, but slows down computation for very repetitive tasks, like performing an action in a loop.

What does match function do in R?

match() function in R Language is used to return the positions of the first match of the elements of the first vector in the second vector. If the element is not found, it returns NA.

How do you use matches in R?

The match() function in R is used to: Return the index position of the first element present in a vector with a specific value. Return the index position of the first matching elements of the first vector in the second vector.


1 Answers

The reason is that R is not actually doing linear search, but it sets up a hash table. This is effective if you are searching for several items, but not very effective if you are searching for one number only. Here is the profiling of the function:

enter image description here

A "better" implementation could use a linear search, if you are searching for a single integer in an array. I guess that would be faster.

like image 114
Gabor Csardi Avatar answered Nov 15 '22 00:11

Gabor Csardi