Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does R `unique` always return values in same order?

Tags:

r

unique

Stupid example:

df <- data.frame(group=rep(LETTERS, each=2), value=1:52)
res <- unlist(lapply(unique(df$group), function(x) mean(subset(df, group==x)$value)))
names(res) <- unique(df$group)

Will res always be?

   A    B    C    D    E    F    G    H    I    J    K    L    M    N    O    P 
 1.5  3.5  5.5  7.5  9.5 11.5 13.5 15.5 17.5 19.5 21.5 23.5 25.5 27.5 29.5 31.5 
   Q    R    S    T    U    V    W    X    Y    Z 
33.5 35.5 37.5 39.5 41.5 43.5 45.5 47.5 49.5 51.5 

Or will it ever happen that the means calculated on line 2 won't match up to the names on line 3? I guess it depends on the underlying implementation of unique in the R base, but I'm not sure where to find that.

like image 652
fanli Avatar asked Apr 04 '16 21:04

fanli


2 Answers

According to ?unique:

‘unique’ returns a vector, data frame or array like ‘x’ but with duplicate elements/rows removed.

This description gives you a complete description of the ordering -- it will be in the same order as the order of the first unique elements. (I guess I don't see the wiggle room that @joran sees for a different ordering.) For example,

unique(c("B","B","A","C","C","C","B","A"))

will result in

[1] "B" "A" "C"

I believe unique(x) will in general be identical to (but more efficient than)

x[!duplicated(x)]

If you want to look at the internal code, see here: the moving parts are something like

k = 0;
switch (TYPEOF(x)) {
case LGLSXP:
case INTSXP:
for (i = 0; i < n; i++)
    if (LOGICAL(dup)[i] == 0)
    INTEGER(ans)[k++] = INTEGER(x)[i];
break;

i.e., the internal representation is exactly what I said, that it goes through the vector sequentially and fills in non-duplicated elements. Since ordering isn't explicitly guaranteed in the documentation it is theoretically possible that this implementation could change in the future, but it is almost vanishingly unlikely.

For what you're trying to do there are simpler R idioms

df <- data.frame(group=rep(LETTERS, each=2), value=1:52)
a1 <- aggregate(df$value,list(df$group),mean)

This returns a two-column data frame, so you can use

setNames(a1[,2],a1[,1])

to convert it to your format. Or

library(plyr)
unlist(daply(df,"group",summarise,val=mean(value)))
like image 156
Ben Bolker Avatar answered Oct 14 '22 04:10

Ben Bolker


R will return a sorted vector if unique is called on a RasterLayer object.

example <- raster(xmn = 0, xmx = 100, ymn = 0, ymx = 100, nrow = 100, ncol = 100)
example[] <- sample(x <- 1:100, 10000, replace = TRUE)

plot(example)

vals <- values(example)[x]
identical(vals, x)

uniques <- unique(example)
identical(uniques, x)

The values should (very likely) not be identical to the ordered vector, but unique values will always be identical to the ordered vector.

Otherwise, the previous answers are correct that R will return a vector of the order that the non-duplicates appeared.

like image 23
Andrew Avatar answered Oct 14 '22 04:10

Andrew