<code>sort</code> has the argument <code>index.return</code> which is by default <code>FALSE</code>. If you set it to <code>TRUE</code> you get the ordering index... basically the same as when you use <code>order</code>. My question Are there cases where it makes sense to use <code>sort</code> with <code>index.return = TRUE</code> instead of <code>order</code>?

<code>order</code> simply gives the indexes, instead <code>sort</code> gives also the values (and with <code>index.return=T</code> a <code>list</code>): <pre class="prettyprint"><code>x <- runif(10, 0, 100) order(x) # [1] 2 7 1 9 6 5 8 10 4 3 sort(x, index.return=T) # $`x` # [1] 0.08140348 0.18272011 0.23575252 0.51493537 0.64281259 0.92121388 0.93759670 0.96221375 0.97646916 0.97863369 # # $ix # [1] 2 7 1 9 6 5 8 10 4 3 </code></pre> It seems that <code>order</code> is a little faster with big numbers (longer vector size): <pre class="prettyprint"><code>x <- runif(10000000, 0, 100) microbenchmark::microbenchmark( sort = {sort(x, index.return=T)}, order = {x[order(x)]}, times = 100 ) # Unit: milliseconds # expr min lq mean median uq max neval # sort 63.48221 67.79530 78.33724 70.74215 74.10109 173.1129 100 # order 56.46055 57.18649 60.88239 58.29462 62.13086 155.5481 100 </code></pre> So probably you should pick sort with <code>index.return = TRUE</code> only if you need a <code>list</code> object to be returned. I can't find an example where <code>sort</code> is better than the other.

My suggestions are based on RLave's answer. You could use the argument <code>method</code>, <code>sort(x,method="quick",index.return=TRUE)</code>, and the function might be a little faster than the default. Also if you want a faster (for large vectors) alternative method of this, you can use this function: <pre class="prettyprint"><code>sort_order <- function(x){ indices <- order(x) #you can choose a method also but leave default. list("x"=x[indices],"ix"=indices) } </code></pre> Here are some benchmarks. <pre class="prettyprint"><code>microbenchmark::microbenchmark( sort=s<-sort(x,index.return=T), "quick sort"=sq<-sort(x,method="quick",index.return=T), "order sort"=so<-sort_order(x),times = 10 times=10 ) Unit: seconds expr min lq mean median uq max neval sort 1.493714 1.662791 1.737854 1.708502 1.887993 1.960912 10 quick sort 1.366938 1.374874 1.451778 1.444342 1.480122 1.668693 10 order sort 1.181974 1.344398 1.359209 1.369108 1.424569 1.461862 10 all.equal(so,sq) [1] TRUE all.equal(s,so) [1] TRUE </code></pre>

Is there a good reason to use `sort` with `index.return = TRUE` instead of `order`?

2 Answers

order simply gives the indexes, instead sort gives also the values (and with index.return=T a list):

x <- runif(10, 0, 100)
order(x)
# [1]  2  7  1  9  6  5  8 10  4  3
sort(x, index.return=T)
# $`x`
# [1] 0.08140348 0.18272011 0.23575252 0.51493537 0.64281259 0.92121388 0.93759670 0.96221375 0.97646916 0.97863369
# 
# $ix
# [1]  2  7  1  9  6  5  8 10  4  3

It seems that order is a little faster with big numbers (longer vector size):

x <- runif(10000000, 0, 100)

microbenchmark::microbenchmark(
  sort = {sort(x, index.return=T)},
  order = {x[order(x)]},
  times = 100
)
# Unit: milliseconds
# expr      min       lq     mean   median       uq      max neval
# sort 63.48221 67.79530 78.33724 70.74215 74.10109 173.1129   100
# order 56.46055 57.18649 60.88239 58.29462 62.13086 155.5481   100

So probably you should pick sort with index.return = TRUE only if you need a list object to be returned. I can't find an example where sort is better than the other.

111

answered Nov 15 '22 03:11

RLave

My suggestions are based on RLave's answer.

You could use the argument method, sort(x,method="quick",index.return=TRUE), and the function might be a little faster than the default. Also if you want a faster (for large vectors) alternative method of this, you can use this function:

sort_order <- function(x){
    indices <- order(x) #you can choose a method also but leave default.
    list("x"=x[indices],"ix"=indices)
}

Here are some benchmarks.

microbenchmark::microbenchmark(
     sort=s<-sort(x,index.return=T),
     "quick sort"=sq<-sort(x,method="quick",index.return=T),
     "order sort"=so<-sort_order(x),times = 10
     times=10
)

Unit: seconds
         expr      min       lq     mean   median       uq      max neval
         sort 1.493714 1.662791 1.737854 1.708502 1.887993 1.960912    10
   quick sort 1.366938 1.374874 1.451778 1.444342 1.480122 1.668693    10
   order sort 1.181974 1.344398 1.359209 1.369108 1.424569 1.461862    10

all.equal(so,sq)
[1] TRUE
all.equal(s,so)
[1] TRUE

answered Nov 15 '22 04:11

Manos Papadakis

Related questions
                            
                                How do I split a data frame among columns, say at every nth column?
                            
                                Can't figure out how to use conda environment after reticulate::use_condaenv(path)
                            
                                Implementing custom stopping metrics to optimize during training in H2O model directly from R
                            
                                How to make scatterplot points open a hyperlink using ggplotly - R
                            
                                Add column with percentage of matching words in two different columns (by row) in R
                            
                                How to output the columns with the maximum value
                            
                                Populating a "count matrix" with permutations of R data.table rows
                            
                                R: From GeoJson to DataFrame?
                            
                                How to Apply String Vector to Logical Vector
                            
                                data.table modifies parent environment / weird behavior with setDT
                            
                                R. plotly - padding or margin for graph inside Shinyapp?
                            
                                show multiple plots from ggplot on one page in r
                            
                                Fill down every other row with level above in tidyverse
                            
                                Combine rows based on ranges in a column
                            
                                Dot-and-whisker plots of filtered estimates for multiple regression models
                            
                                Conditional running count (cumulative sum) with reset in R (dplyr)
                            
                                p-value from fisher.test() does not match phyper()
                            
                                Foreach .combine Function to combine lists in R
                            
                                Fast Wald confidence intervals for a glm with broom in R
                            
                                use assign() inside purrr:walk()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a good reason to use `sort` with `index.return = TRUE` instead of `order`?

Tags:

sorting

r

vonjd

People also ask

2 Answers

RLave

Manos Papadakis

Recent Activity

Donate For Us