Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort files/objects with numbers and letters (alphanumeric) names

My files are:

CT.BP.50.txt
CT.BP.200.txt
CT.BP.500.txt 
GP.BP.50.txt
GP.BP.200.txt 
GP.BP.500.txt 

files <- c("CT.BP.50.txt", "CT.BP.200.txt", "CT.BP.500.txt", "GP.BP.50.txt", "GP.BP.200.txt", "GP.BP.500.txt")

I want to perform specific operation on them, I can do this:

for (i in 1:length(files)) {
    foo <- read.table(files[i])
    barplot(table(foo$V1), main = files[i])
}

But R plots them in this order:

"CT.BP.200.txt" "CT.BP.500.txt" "CT.BP.50.txt" "GP.BP.200.txt" "GP.BP.500.txt" "GP.BP.50.txt"

And I want them to be plotted in sorted order:

"CT.BP.50.txt" "CT.BP.200.txt" "CT.BP.500.txt" "GP.BP.50.txt" "GP.BP.200.txt" "GP.BP.500.txt"

How sort objects with alphanumeric names?

like image 608
pogibas Avatar asked Dec 10 '22 01:12

pogibas


1 Answers

The problem is that list.files() returns the file names in standard (lexically) sorted order, and the digits are being compared position by position rather than as part of a number.

files <- sort(c("Gen.Var_CT.BP.200.txt", "Gen.Var_CT.BP.500.txt", 
                "Gen.Var_CT.BP.50.txt", "Gen.Var_GP.BP.200.txt",
                "Gen.Var_GP.BP.500.txt", "Gen.Var_GP.BP.50.txt"))

On my system, this gives:

> files
[1] "Gen.Var_CT.BP.200.txt" "Gen.Var_CT.BP.50.txt"  "Gen.Var_CT.BP.500.txt"
[4] "Gen.Var_GP.BP.200.txt" "Gen.Var_GP.BP.50.txt"  "Gen.Var_GP.BP.500.txt"

The function gtools::mixedsort will (in general) sort the way you want: series of digits in a string will be treated as numbers for sorting purposes. There is a bit of a snag with your example, though, because mixedsort assumes . are part of numbers and so sees .200. as a potential number, which can't actually be sorted as a number. Since your examples don't have actual decimal points within them, you can get around this.

files <- files[mixedorder(gsub("\\.", " ", files))]

So files is now sorted as:

> files
[1] "Gen.Var_CT.BP.50.txt"  "Gen.Var_CT.BP.200.txt" "Gen.Var_CT.BP.500.txt"
[4] "Gen.Var_GP.BP.50.txt"  "Gen.Var_GP.BP.200.txt" "Gen.Var_GP.BP.500.txt"
like image 59
Brian Diggs Avatar answered Jan 31 '23 11:01

Brian Diggs