Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to order my dataframe lexicographicaly

Tags:

sorting

r

r-faq

I have a following data frame

a = data.frame(a=c(1,2,3,4,5,6,7),b=c(1,2,3,10,12,21,4),c=c(1,2,10,11,"X","Y",3))
> a
  a  b  c
1 1  1  1
2 2  2  2
3 3  3 10
4 4 10 11
5 5 12  X
6 6 21  Y
7 7  4  3

I want to sort whole data frame in lexicographical order, so that the output (for example, column "c") should be like

> a[,"c"]
[1] 1  2  3 10 11  X  Y

I tried and I am geting different answer

indata <- a[do.call(order,a[,c("c","a","b")]),]
> indata[,"c"]
[1] 1  10 11 2  3  X  Y
Levels: 1 10 11 2 3 X Y

I tried gtools, mixedorder package and worked fine on one column:

> a[mixedorder(a$c),]
  a  b  c
1 1  1  1
2 2  2  2
3 3  3 10
4 4 10 11
5 5 12  X
6 6 21  Y
7 7  4  3

but it doesn't work if I include multiple columns:

> a[with(a,order(mixedorder(c),mixedorder(b),mixedorder(a))),]
  a  b  c
1 1  1  1
2 2  2  2
4 4 10 11
5 5 12  X
6 6 21  Y
7 7  4  3
3 3  3 10

though I am expecting :

  a  b  c
1 1  1  1
2 2  2  2
4 7  4  3
5 3  3 10
6 4 10 11
7 5 12  X
3 6 21  Y
like image 833
user1631306 Avatar asked Oct 09 '12 18:10

user1631306


People also ask

How do you order data frames?

To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.

Does Python sort Lexicographically?

Lexicographic Order in Python: One such technique in python is to sort the data elements lexicographically. In the lexicographical order data elements are sorted in the dictionary order i.e. the first alphabet of data elements will be compared and sorted.


4 Answers

One option is to use mixedorder() from the gtools package.

library(gtools)
a[mixedorder(a$c),]
#   a  b  c
# 1 1  1  1
# 2 2  2  2
# 7 7  4  3
# 3 3  3 10
# 4 4 10 11
# 5 5 12  X
# 6 6 21  Y
like image 197
Josh O'Brien Avatar answered Oct 13 '22 00:10

Josh O'Brien


Sticking in base you could make a function yourself:

a = data.frame(a=c(1,2,3,4,5,6,7),b=c(1,2,3,10,12,21,4),c=c(1,2,10,11,"X","Y",3))

SORTER_DEVICE <- function(x) {
    c(sort(as.numeric(na.omit(gsub("[a-zA-Z]", NA, x)))),
        sort(na.omit(gsub("[0-9]", NA, x))))
}
data.frame(apply(a, 2, SORTER_DEVICE))
like image 30
Tyler Rinker Avatar answered Oct 13 '22 00:10

Tyler Rinker


Unfortunately mixedsort does not (yet) support multiple column sorting. So, you need to implement it yourself, for example like this:

a[order(sub("[0-9]+", "", a$c),
        as.numeric(sub("[[:alpha:]]*([[:digit:]]*)", '\\1', a$c)),
        as.numeric(a$b),
        as.numeric(a$a)), ]

This first, alphanumerically sorts data.frame using a$c, and for tie situations(which actually does not exist in your data.frame 'a'), it uses a$b and a$a.

Output is:

  a  b  c
1 1  1  1
2 2  2  2
7 7  4  3
3 3  3 10
4 4 10 11
5 5 12  X
6 6 21  Y

PS: This was written by David Winsemius in this post as a reply to a similar question.

like image 22
gkcn Avatar answered Oct 13 '22 00:10

gkcn


Assuming these are human chromosome names, chr1...chr22, chrX, chrY. We can convert them to numeric, then use order:

# convert to numeric
a$chromN <- as.integer(ifelse(a$c == "X", "23", ifelse(a$c == "Y", "24", a$c)))

# now sort as usual:
a[ order(a$chromN), ]

#   a  b  c chromN
# 1 1  1  1      1
# 3 3  3 10      2
# 4 4 10 11      3
# 2 2  2  2      4
# 7 7  4  3      5
# 5 5 12  X     23
# 6 6 21  Y     24
like image 20
zx8754 Avatar answered Oct 12 '22 23:10

zx8754