Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Argument "partial" of the sort function in R

Tags:

sorting

r

?sort states that the partial argument may be NULL or a vector of indices for partial sorting.

I tried:

x <- c(1,3,5,2,4,6,7,9,8,10)
sort(x)
## [1]  1  2  3  4  5  6  7  8  9 10
sort(x, partial=5)
## [1]  1  3  4  2  5  6  7  9  8 10
sort(x, partial=2)
## [1]  1  2  5  3  4  6  7  9  8 10
sort(x, partial=4)
## [1]  1  2  3  4  5  6  7  9  8 10

I am not sure what partial means when sorting a vector.

like image 369
useR Avatar asked May 10 '14 08:05

useR


1 Answers

As ?sort states,

If partial is not NULL, it is taken to contain indices of elements of the result which are to be placed in their correct positions in the sorted array by partial sorting.

In other words, the following assertion is always true:

 stopifnot(sort(x, partial=pt_idx)[pt_idx] == sort(x)[pt_idx])

for any x and pt_idx, e.g.

 x <- sample(100) # input vector
 pt_idx <- sample(1:100, 5) # indices for partial arg

This behavior is different from the one defined in the Wikipedia article on partial sorting. In R sort()'s case we are not necessarily computing k smallest elements.

For example, if

print(x)
##  [1]  91  85  63  80  71  69  20  39  78  67  32  56  27  79   9  66  88  23  61  75  68  81  21  90  36  84  11   3  42  43
##  [31]  17  97  57  76  55  62  24  82  28  72  25  60  14  93   2 100  98  51  29   5  59  87  44  37  16  34  48   4  49  77
##  [61]  13  95  31  15  70  18  52  58  73   1  45  40   8  30  89  99  41   7  94  47  96  12  35  19  38   6  74  50  86  65
##  [91]  54  46  33  22  26  92  53  10  64  83

and

pt_idx
## [1]  5 54 58 95  8

then

sort(x, partial=pt_idx)
##  [1]   1   3   2   4   5   6   7   8  11  12   9  10  13  15  14  16  17  18  23  30  31  27  21  32  36  34  35  19  20  37
## [31]  38  33  29  22  26  25  24  28  39  41  40  42  43  48  46  44  45  47  51  50  52  49  53  54  57  56  55  58  59  60
## [61]  62  64  63  61  65  66  70  72  73  69  68  71  67  79  78  82  75  81  80  77  76  74  89  85  88  87  83  84  86  90
## [91]  92  93  91  94  95  96  97  99 100  98

Here x[5], x[54], ..., x[8] are placed in their correct positions - and we cannot say anything else about the remaining elements. HTH.

EDIT: Partial sorting may reduce the sorting time, of course if you are interested in e.g. finding only some of the order statistics.

require(microbenchmark)
x <- rnorm(100000)
microbenchmark(sort(x, partial=1:10)[1:10], sort(x)[1:10])
## Unit: milliseconds
##                           expr       min        lq    median        uq      max neval
##  sort(x, partial = 1:10)[1:10]  2.342806  2.366383  2.393426  3.631734 44.00128   100
##                  sort(x)[1:10] 16.556525 16.645339 16.745489 17.911789 18.13621   100
like image 84
gagolews Avatar answered Sep 20 '22 02:09

gagolews