Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort matrix (or data.frame) on amount of unique values per column

How to reorder columns of a data.frame on the total amount of unique values per column? As an example:

var1 var2 var3
  1    1   1
  0    2   2
  1    3   3
  0    4   1
  1    5   2

Is there a way to reorder this like var2, var3, var1 automatically (because the length of unique values are 5, 3, and 2 respectively, or the opposite, 2 3 5)?

In this case it is not that difficult to get what we want, but in my case I've many columns. Is there a way to do this type of sorting automatically?

Also, I'd prefer to have a solution that works on matrix (in addition to data.frame), independent of whether there are column names or not.

like image 312
PascalVKooten Avatar asked Dec 16 '22 13:12

PascalVKooten


1 Answers

Something like this?

df[names(sort(sapply(df, function(x) length(unique(x))), decreasing = TRUE))]

#   var2 var3 var1
# 1    1    1    1
# 2    2    2    0
# 3    3    3    1
# 4    4    1    0
# 5    5    2    1

If your input is a matrix, then:

m[, names(sort(apply(m, 2, function(x) 
       length(unique(x))), decreasing = TRUE))] 

should work.

#      var2 var3 var1
# [1,]    1    1    1
# [2,]    2    2    0
# [3,]    3    3    1
# [4,]    4    1    0
# [5,]    5    2    1

Edit: your example in the post seems to have column names, but this one you gave in your comments doesn't. Please make sure to produce the example correctly.

X <- cbind(1, rnorm(10), 1:10)

Since you can't expect column names, you'll have to return indices. Try this (it'll work if you've column names or not, of course):

m[, sort(apply(X, 2, function(x) 
         length(unique(x))), decreasing = TRUE, index.return = TRUE)$ix]
like image 50
Arun Avatar answered Jan 31 '23 10:01

Arun