Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to not return the `by` column as the first column in a data.table grouping

Tags:

r

data.table

If I group-by using the by keyword in data.table it always returns the by column as the first column. Is there a flag/option to tell it not to do this? Or a clever way of getting rid of it?

In particular I want to group and then rbindlist to my original table, so infact the issue could also be said as - "how to stop it reordering the columns"

For example:

DT = data.table(I = as.numeric(1:6), N = rnorm(6), L = rep(c("a", "b", "c"), 2))
DT[, list(I = mean(I), N = mean(N)), by= L]
DT

gives:

> DT[, list(I = mean(I), N = mean(N)), by= L]
   L   I          N
1: a 2.5  0.4291802
2: b 3.5  0.6669517
3: c 4.5 -0.6471886
> DT
   I          N L
1: 1  1.8460998 a
2: 2  0.7093438 b
3: 3 -1.7991193 c
4: 4 -0.9877394 a
5: 5  0.6245596 b
6: 6  0.5047421 c

As far as the rbindlist request goes, it would be nice to be able to do this:

DT = rbindlist(list(DT, DT[, list(I = mean(I), N = mean(N)), by= L]))

or perhaps

DT = rbindlist(list(DT, DT[, list(I = mean(I), N = mean(N), L), by= L]))

or something similar (neither of which work)

like image 771
Corvus Avatar asked Feb 23 '13 10:02

Corvus


1 Answers

I don't particularly like this auto-column-reordering as well. The "trick" I usually do is to use setcolorder after obtaining the output as follows:

DT <- data.table(I = 1:6, N = rnorm(6), L = rep(c("a", "b", "c"), 2))
DT.out <- DT[, list(I = mean(I), N = mean(N)), by= L]

Here, setcolorder as:

setcolorder(DT.out, names(DT))

#      I            N L
# 1: 2.5  0.772719306 a
# 2: 3.5 -0.008921738 b
# 3: 4.5 -0.770807996 c

Of course this works if the names of DT are the same as DT.out. Else, you'll have to specify the column order explicitly as:

setcolorder(DT.out, c("I", "N", "L"))

Edit: Since you'd like to bind them by rows immediately, yes, it would be nice to not have this as an intermediate result. And since rbindlist seems to bind by position, you could use rbind which binds by column names and data.table tells this as a warning and suggests to use use.names=F if you want to instead bind by position. You can safely ignore this warning.

dt1 <- data.table(x=1:5, y=6:10)
dt2 <- data.table(y=1:5, x=6:10)

rbind(dt1, dt2) # or do.call(rbind, list(dt1, dt2))

#      x  y
#  1:  1  6
#  2:  2  7
#  3:  3  8
#  4:  4  9
#  5:  5 10
#  6:  6  1
#  7:  7  2
#  8:  8  3
#  9:  9  4
# 10: 10  5
# Warning message:
# In .rbind.data.table(...) :
#   Argument 2 has names in a different order. Columns will be bound by name for 
#   consistency with base. Alternatively, you can drop names (by using an unnamed 
#   list) and the columns will then be joined by position. Or, set use.names=FALSE.
like image 155
Arun Avatar answered Oct 15 '22 13:10

Arun