I have a following data frame
a = data.frame(a=c(1,2,3,4,5,6,7),b=c(1,2,3,10,12,21,4),c=c(1,2,10,11,"X","Y",3))
> a
a b c
1 1 1 1
2 2 2 2
3 3 3 10
4 4 10 11
5 5 12 X
6 6 21 Y
7 7 4 3
I want to sort whole data frame in lexicographical order, so that the output (for example, column "c") should be like
> a[,"c"]
[1] 1 2 3 10 11 X Y
I tried and I am geting different answer
indata <- a[do.call(order,a[,c("c","a","b")]),]
> indata[,"c"]
[1] 1 10 11 2 3 X Y
Levels: 1 10 11 2 3 X Y
I tried gtools, mixedorder package and worked fine on one column:
> a[mixedorder(a$c),]
a b c
1 1 1 1
2 2 2 2
3 3 3 10
4 4 10 11
5 5 12 X
6 6 21 Y
7 7 4 3
but it doesn't work if I include multiple columns:
> a[with(a,order(mixedorder(c),mixedorder(b),mixedorder(a))),]
a b c
1 1 1 1
2 2 2 2
4 4 10 11
5 5 12 X
6 6 21 Y
7 7 4 3
3 3 3 10
though I am expecting :
a b c
1 1 1 1
2 2 2 2
4 7 4 3
5 3 3 10
6 4 10 11
7 5 12 X
3 6 21 Y
To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.
Lexicographic Order in Python: One such technique in python is to sort the data elements lexicographically. In the lexicographical order data elements are sorted in the dictionary order i.e. the first alphabet of data elements will be compared and sorted.
One option is to use mixedorder()
from the gtools package.
library(gtools)
a[mixedorder(a$c),]
# a b c
# 1 1 1 1
# 2 2 2 2
# 7 7 4 3
# 3 3 3 10
# 4 4 10 11
# 5 5 12 X
# 6 6 21 Y
Sticking in base you could make a function yourself:
a = data.frame(a=c(1,2,3,4,5,6,7),b=c(1,2,3,10,12,21,4),c=c(1,2,10,11,"X","Y",3))
SORTER_DEVICE <- function(x) {
c(sort(as.numeric(na.omit(gsub("[a-zA-Z]", NA, x)))),
sort(na.omit(gsub("[0-9]", NA, x))))
}
data.frame(apply(a, 2, SORTER_DEVICE))
Unfortunately mixedsort does not (yet) support multiple column sorting. So, you need to implement it yourself, for example like this:
a[order(sub("[0-9]+", "", a$c),
as.numeric(sub("[[:alpha:]]*([[:digit:]]*)", '\\1', a$c)),
as.numeric(a$b),
as.numeric(a$a)), ]
This first, alphanumerically sorts data.frame using a$c, and for tie situations(which actually does not exist in your data.frame 'a'), it uses a$b and a$a.
Output is:
a b c
1 1 1 1
2 2 2 2
7 7 4 3
3 3 3 10
4 4 10 11
5 5 12 X
6 6 21 Y
PS: This was written by David Winsemius in this post as a reply to a similar question.
Assuming these are human chromosome names, chr1...chr22, chrX, chrY. We can convert them to numeric, then use order:
# convert to numeric
a$chromN <- as.integer(ifelse(a$c == "X", "23", ifelse(a$c == "Y", "24", a$c)))
# now sort as usual:
a[ order(a$chromN), ]
# a b c chromN
# 1 1 1 1 1
# 3 3 3 10 2
# 4 4 10 11 3
# 2 2 2 2 4
# 7 7 4 3 5
# 5 5 12 X 23
# 6 6 21 Y 24
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With