Why is the R match function so slow?

Tags:

r

The following should find the location of the first instance of the integer 1:

array <- rep(1,10000000)
system.time(match(1,array))

This returns

   user  system elapsed
  0.720   1.243   1.964

If I run the same task using an array of size 100 I get this:

   user  system elapsed
      0       0       0

Since all it should be doing is looking at the first value in the array and returning a match, the time taken should be that of a lookup and a comparison, regardless of the size of the array. If I wrote this in lower-level language it would cost in the order of a handful of clock cycles (a microsecond or less?) regardless of the array size. Why does it take a second in R? It seems to be iterating through he whole array...

Is there a way for it to abort once it has found its match, rather than continuing to iterate unnecessarily?

432

asked Jul 24 '14 00:07

quant

1 Answers

The reason is that R is not actually doing linear search, but it sets up a hash table. This is effective if you are searching for several items, but not very effective if you are searching for one number only. Here is the profiling of the function:

enter image description here

A "better" implementation could use a linear search, if you are searching for a single integer in an array. I guess that would be faster.

114

answered Nov 15 '22 00:11

Gabor Csardi

Related questions
                            
                                Fastest way to do this double summation?
                            
                                How to add abline in ggplot2 with x-axis as year?
                            
                                how to replace legend 'bullet' of geom_text guide (legend)
                            
                                Strange error: formal arguments omitted in the method definition cannot be in the signature
                            
                                choose n most distant points in R
                            
                                Divide every number in every column by 1000 in R
                            
                                Choosing specific lags in ARIMA or VAR Model
                            
                                passing function argument to dplyr select
                            
                                inline function code doesn't compile
                            
                                Sample equidistant points from a numeric vector
                            
                                Are we able to generate a list of loaded packages in R?
                            
                                listing all subsets of a vector for a given size
                            
                                Converting .Rd file to plain text
                            
                                R-plot a centered legend at outer margins of multiple plots
                            
                                ggplot2 - Modify geom_density2d to accept weights as a parameter?
                            
                                merge partial matched strings
                            
                                How does cox.zph deal with time-dependent covariates?
                            
                                Reactive colours in shiny
                            
                                Date format for subset of ticks on time axis
                            
                                How write code to web crawling and scraping in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With