Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set display order of data.table `(group, -value.1)` while preserving key `id`

Is it possible to store the order of rows in a data.table while preserving its keys?

Lets say I have the following dummy table:

library(data.table)
dt <- data.table(id=letters[1:6], 
                   group=sample(c("red", "blue"), replace=TRUE), 
                   value.1=rnorm(6), 
                   value.2=runif(6))
setkey(dt, id)
dt
   id group    value.1    value.2
1:  a  blue  1.4557851 0.73249612
2:  b   red -0.6443284 0.49924102
3:  c  blue -1.5531374 0.72977197
4:  d   red -1.5977095 0.08033604
5:  e  blue  1.8050975 0.43553048
6:  f   red -0.4816474 0.23658045

I would like to store this table so that rows are ordered by group, and by value.1 in decreasing order, i.e:

> dt[order(group, value.1, decreasing=T),]
   id group    value.1    value.2
1:  f   red -0.4816474 0.23658045
2:  b   red -0.6443284 0.49924102
3:  d   red -1.5977095 0.08033604
4:  e  blue  1.8050975 0.43553048
5:  a  blue  1.4557851 0.73249612
6:  c  blue -1.5531374 0.72977197

Obviously I can save this as a new variable, but I also want to keep the id column as my primary key.

Arun's answer to "What is the purpose of setting a key in data.table?" suggests that this can be achieved with clever use setkey, since it orders the data.table in the order of its keys (although there is no option to set the key to decreasing order):

> setkey(dt, group, value.1, id)
> dt
   id group    value.1    value.2
1:  c  blue -1.5531374 0.72977197
2:  a  blue  1.4557851 0.73249612
3:  e  blue  1.8050975 0.43553048
4:  d   red -1.5977095 0.08033604
5:  b   red -0.6443284 0.49924102
6:  f   red -0.4816474 0.23658045

However, I lose the ability to use id as my primary key, because group is the first key provided:

> dt["a"]
   group id value.1 value.2
1:     a NA      NA      NA
like image 627
Scott Ritchie Avatar asked Oct 21 '22 08:10

Scott Ritchie


1 Answers

Sounds like you simply want to modify print.data.table:

print.data.table = function(x, ...) {
  # put whatever condition identifies your tables here
  if ("group" %in% names(x) && "value.1" %in% names(x)) {
    data.table:::print.data.table(x[order(group, value.1, decreasing = T)], ...)
  } else {
    data.table:::print.data.table(x, ...)
  }
}

set.seed(2)
dt = data.table(id=letters[1:6], 
               group=sample(c("red", "blue"), replace=TRUE), 
               value.1=rnorm(6), 
               value.2=runif(6))
setkey(dt, id)
dt
#   id group     value.1    value.2
#1:  a   red  0.18484918 0.40528218
#2:  e   red  0.13242028 0.44480923
#3:  c   red -1.13037567 0.97639849
#4:  b  blue  1.58784533 0.85354845
#5:  f  blue  0.70795473 0.07497942
#6:  d  blue -0.08025176 0.22582546

dt["c"]
#   id group   value.1   value.2
#1:  c   red -1.130376 0.9763985
like image 98
eddi Avatar answered Oct 27 '22 21:10

eddi