Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting Data.Table Based on Multiple Columns

Consider a data.table below:

DT <- data.table(a=c(1,2,4,3,5), b=c(3:5,NA,2), c=c(2,1,NA,NA,3)) 
DT
   a  b  c
1: 1  3  2
2: 2  4  1
3: 4  5 NA
4: 3 NA NA
5: 5  2  3

I want to sort the rows based on 3rd column and then 1st column. I can do it using:

DT[order(DT[,3],DT[,1])]

   a  b  c
1: 2  4  1
2: 1  3  2
3: 5  2  3
4: 3 NA NA
5: 4  5 NA

But, if DT has many columns and lets say I want to sort them based on 1st to i-th columns, then it won't be that efficient to write it as:

DT[order(DT[,1], DT[,2], DT[,3], ... DT[,i])]

Instead, I'd like to provide the column indices as a vector (see below):

DT[order(DT[,c(1:i)])]

But, it doesn't work the way I expect and the output is:

DT[order(DT[,c(3,1)])]

     a  b  c
 1:  2  4  1
 2: NA NA NA
 3:  1  3  2
 4: NA NA NA
 5:  5  2  3
 6: NA NA NA
 7: NA NA NA
 8: NA NA NA
 9:  4  5 NA
10:  3 NA NA

Any advise on how I can fix that? Thanks!

like image 413
Mahmoud Avatar asked Oct 26 '18 17:10

Mahmoud


People also ask

Is it possible to sort multiple columns in a table?

Press Shift + Left Click on the Header Column on which you wish to apply Multiple Sort. Here, we will select the Order Date Column with Shift + Left Click. You can see here, first, it sorts the data based on the Item and then it sorted data based on “Order Date” column.


1 Answers

We can use do.call with order after specifying the .SDcols

DT[DT[,do.call(order, .SD), .SDcols = c(3, 1)]]
#   a  b  c
#1: 2  4  1
#2: 1  3  2
#3: 5  2  3
#4: 3 NA NA
#5: 4  5 NA
like image 81
akrun Avatar answered Oct 11 '22 09:10

akrun