The following should find the location of the first instance of the integer 1
:
array <- rep(1,10000000)
system.time(match(1,array))
This returns
user system elapsed
0.720 1.243 1.964
If I run the same task using an array of size 100
I get this:
user system elapsed
0 0 0
Since all it should be doing is looking at the first value in the array and returning a match, the time taken should be that of a lookup and a comparison, regardless of the size of the array. If I wrote this in lower-level language it would cost in the order of a handful of clock cycles (a microsecond or less?) regardless of the array size. Why does it take a second in R? It seems to be iterating through he whole array...
Is there a way for it to abort once it has found its match, rather than continuing to iterate unnecessarily?
There is a lot of overhead in the processing because R needs to check the type of a variable nearly every time it looks at it. This makes it easy to change types and reuse variable names, but slows down computation for very repetitive tasks, like performing an action in a loop.
match() function in R Language is used to return the positions of the first match of the elements of the first vector in the second vector. If the element is not found, it returns NA.
The match() function in R is used to: Return the index position of the first element present in a vector with a specific value. Return the index position of the first matching elements of the first vector in the second vector.
The reason is that R is not actually doing linear search, but it sets up a hash table. This is effective if you are searching for several items, but not very effective if you are searching for one number only. Here is the profiling of the function:
A "better" implementation could use a linear search, if you are searching for a single integer in an array. I guess that would be faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With