I've encountered an application where I need to sort a data.frame by column numbers, and none of the usual solutions seem to allow that.
The context is creating an as.data.frame.by
method. Since a by
object will have its last column as the value column and the first ncol-1 columns as the index columns. melt
returns it sorted backwards--index 3, then index 2, then index 1. For compatibility with latex.table.by
I'd like to sort it forwards. But I'm having trouble doing that in a sufficiently generic way. The commented-out line in the function below is my best attempt so far.
as.data.frame.by <- function( x, colnames=paste("IDX",seq(length(dim(x))),sep="" ), ... ) {
num.by.vars <- length(dim(x))
res <- melt(unclass(x))
res <- na.omit(res)
colnames(res)[seq(num.by.vars)] <- colnames
#res <- res[ order(res[ , seq(num.by.vars)] ) , ] # Sort the results by the by vars in the heirarchy given
res
}
dat <- transform( ChickWeight, Time=cut(Time,3), Chick=cut(as.numeric(Chick),3) )
my.by <- by( dat, with(dat,list(Time,Chick,Diet)), function(x) sum(x$weight) )
> as.data.frame(my.by)
IDX1 IDX2 IDX3 value
1 (-0.021,6.99] (0.951,17.3] 1 3475
2 (6.99,14] (0.951,17.3] 1 5969
3 (14,21] (0.951,17.3] 1 8002
4 (-0.021,6.99] (17.3,33.7] 1 640
5 (6.99,14] (17.3,33.7] 1 1596
6 (14,21] (17.3,33.7] 1 2900
13 (-0.021,6.99] (17.3,33.7] 2 2253
14 (6.99,14] (17.3,33.7] 2 4734
15 (14,21] (17.3,33.7] 2 7727
22 (-0.021,6.99] (17.3,33.7] 3 666
23 (6.99,14] (17.3,33.7] 3 1391
24 (14,21] (17.3,33.7] 3 2109
25 (-0.021,6.99] (33.7,50] 3 1647
26 (6.99,14] (33.7,50] 3 3853
27 (14,21] (33.7,50] 3 7488
34 (-0.021,6.99] (33.7,50] 4 2412
35 (6.99,14] (33.7,50] 4 5448
36 (14,21] (33.7,50] 4 8101
With the line uncommented, it returns gibberish (it just treats the whole data.frame as a vector, with disastrous results).
I've even tried clever stuff like res <- res[ order( ...=list(res[,1],res[,2]) ) , ]
but to no avail.
I suspect there's a simple way to do this, but I'm not seeing it.
Edit for clarification: I want to not have to specify column names. Instead, I want to be able to sort it by a numerical vector (e.g. sort by columns 1:4).
mydf <- as.data.frame(my.by)
mydf[order(mydf$IDX3, mydf$IDX2, mydf$IDX1) , ]
IDX1 IDX2 IDX3 value
1 (-0.021,6.99] (0.951,17.3] 1 3475
3 (14,21] (0.951,17.3] 1 8002
2 (6.99,14] (0.951,17.3] 1 5969
4 (-0.021,6.99] (17.3,33.7] 1 640
6 (14,21] (17.3,33.7] 1 2900
5 (6.99,14] (17.3,33.7] 1 1596
13 (-0.021,6.99] (17.3,33.7] 2 2253
15 (14,21] (17.3,33.7] 2 7727
14 (6.99,14] (17.3,33.7] 2 4734
22 (-0.021,6.99] (17.3,33.7] 3 666
24 (14,21] (17.3,33.7] 3 2109
23 (6.99,14] (17.3,33.7] 3 1391
25 (-0.021,6.99] (33.7,50] 3 1647
27 (14,21] (33.7,50] 3 7488
26 (6.99,14] (33.7,50] 3 3853
34 (-0.021,6.99] (33.7,50] 4 2412
36 (14,21] (33.7,50] 4 8101
35 (6.99,14] (33.7,50] 4 5448
Or ;
my.by <- by( dat, with(dat,list(Diet,Chick, Time)), function(x) sum(x$weight) )
mydf <- as.data.frame(my.by)
EDIT: Or this produces same output as up top using numeric column indices:
mydf <- as.data.frame(my.by)
mydf[ do.call(order, mydf[, 3:1] ) , ]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With