Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a good reason to use `sort` with `index.return = TRUE` instead of `order`?

Tags:

sorting

r

sort has the argument index.return which is by default FALSE. If you set it to TRUE you get the ordering index... basically the same as when you use order.

My question
Are there cases where it makes sense to use sort with index.return = TRUE instead of order?

like image 842
vonjd Avatar asked Nov 16 '18 12:11

vonjd


People also ask

Why is indexing better than sorting?

Indexing makes applications run more efficiently. Use sorting only when you want to create another table with a different natural order of rows. Indexing orders rows in a specific sequence, usually in ascending or descending order on one field.

Do indexes help with sorting?

Yes, index will help you, when using ORDER BY. Because INDEX is a sorted data structure, so the request will be executed faster.

Does MongoDB sort use index?

If the sort keys correspond to the index keys or an index prefix, MongoDB can use the index to sort the query results. A prefix of a compound index is a subset that consists of one or more keys at the start of the index key pattern. The following query and sort operations use the index prefixes to sort the results.


2 Answers

order simply gives the indexes, instead sort gives also the values (and with index.return=T a list):

x <- runif(10, 0, 100)
order(x)
# [1]  2  7  1  9  6  5  8 10  4  3
sort(x, index.return=T)
# $`x`
# [1] 0.08140348 0.18272011 0.23575252 0.51493537 0.64281259 0.92121388 0.93759670 0.96221375 0.97646916 0.97863369
# 
# $ix
# [1]  2  7  1  9  6  5  8 10  4  3

It seems that order is a little faster with big numbers (longer vector size):

x <- runif(10000000, 0, 100)

microbenchmark::microbenchmark(
  sort = {sort(x, index.return=T)},
  order = {x[order(x)]},
  times = 100
)
# Unit: milliseconds
# expr      min       lq     mean   median       uq      max neval
# sort 63.48221 67.79530 78.33724 70.74215 74.10109 173.1129   100
# order 56.46055 57.18649 60.88239 58.29462 62.13086 155.5481   100

So probably you should pick sort with index.return = TRUE only if you need a list object to be returned. I can't find an example where sort is better than the other.

like image 111
RLave Avatar answered Nov 15 '22 03:11

RLave


My suggestions are based on RLave's answer.

You could use the argument method, sort(x,method="quick",index.return=TRUE), and the function might be a little faster than the default. Also if you want a faster (for large vectors) alternative method of this, you can use this function:

sort_order <- function(x){
    indices <- order(x) #you can choose a method also but leave default.
    list("x"=x[indices],"ix"=indices)
}

Here are some benchmarks.

microbenchmark::microbenchmark(
     sort=s<-sort(x,index.return=T),
     "quick sort"=sq<-sort(x,method="quick",index.return=T),
     "order sort"=so<-sort_order(x),times = 10
     times=10
)

Unit: seconds
         expr      min       lq     mean   median       uq      max neval
         sort 1.493714 1.662791 1.737854 1.708502 1.887993 1.960912    10
   quick sort 1.366938 1.374874 1.451778 1.444342 1.480122 1.668693    10
   order sort 1.181974 1.344398 1.359209 1.369108 1.424569 1.461862    10

all.equal(so,sq)
[1] TRUE
all.equal(s,so)
[1] TRUE
like image 27
Manos Papadakis Avatar answered Nov 15 '22 04:11

Manos Papadakis