Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort some elements in a list while leaving the rest in place?

Tags:

list

sorting

r

I have two slightly different types of lists that I need to sort; however, I only need to sort portions of the list while keeping some elements in place (i.e., their index should stay the same).

First, let's say that I have a list of numbers:

x <- c(4, 8, 1, 7, 3, 0, 5, 2, 6, 9)

I know that if I only wanted to sort the first 5 elements, then I could do something like this:

x[1:5] <- sort(x[1:5])
x

# [1] 1 3 4 7 8 0 5 2 6 9

Second, if I wanted to sort a list, but keep NAs in place, then I could do something like this (though I'm sure there's a better way to do this):

y <- c(4, 8, 1, NA, NA, 7, 3, 0, 5, 2, NA, 6, NA, 9)

y[which(is.na(y)==FALSE)] <- sort(y[which(is.na(y)==FALSE)])
y

# [1]  0  1  2 NA NA  3  4  5  6  7 NA  8 NA  9

Question: How do I sort a list with alphanumeric characters by group? So, I want to first sort the list by a pre-defined letter order (i.e., c(C, A, B)), then numerically by group, but leave NAs in their original index position?

z <- c('B' , 'B1', 'B11', 'B2', 'A', 'C50', 'B21', NA, 'A5', 
       'B22', 'C', NA, 'C1', 'C11', NA, NA, 'C2', NA)

Expected Output

c('C', 'C1', 'C2', 'C11', 'C50', 'A', 'A5', NA, 'B', 'B1', 'B2', NA, 'B11', 'B21', NA, NA, 'B22', NA)

#  [1] "C"   "C1"  "C2"  "C11" "C50" "A"   "A5"  NA    "B"   "B1"  "B2"  NA    "B11" "B21" NA    NA    "B22" NA   

I know that if I just wanted to sort alphabetically, then I could just use the same code as above. However, these also do not sort correctly numerically.

z[which(is.na(z)==FALSE)] <- sort(z[which(is.na(z)==FALSE)])
z

# [1] "A"   "A5"  "B"   "B1"  "B11" "B2"  "B21" NA    "B22" "C"   "C1"  NA    "C11" "C2"  NA    NA    "C50" NA   

However, I'm not sure how to change the order of the letters to c(C, A, B) since these are alphanumeric and to correctly sort numerically. I know that I could use order and match:

f <- sort(z[which(is.na(z)==FALSE)])
z[which(is.na(z)==FALSE)] <- f[order(match(f, c("C","A","B")))]

# [1] "C"   "A"   "B"   "A5"  "B1"  "B11" "B2"  NA    "B21" "B22" "C1"  NA    "C11" "C2"  NA    NA    "C50" NA  

But that would only change if there is a perfect match (hence only C, A, and B move to the beginning of the list and the groups are then lost), and it would not be prudent to have to give the complete alphanumeric list to match. I'm sure there's an easy way to do this (e.g., grepl), but am unsure how to implement it.

like image 996
AndrewGB Avatar asked Dec 30 '21 19:12

AndrewGB


1 Answers

Below function, creates an index for non-NA elements ('i1'), extract the letters from the subset of the vector, convert to a factor with levels specified in the custom order, extract the digits, order the non-NA elements the extracted vectors and assign back, return the updated vector

f1 <- function(vec) {
   i1 <- !is.na(vec)
   v1 <- factor(sub("\\d+", "", vec[i1]), levels = c("C", "A", "B"))
   v2 <- sub("\\D+", "", vec[i1])
   v2[!nzchar(v2)] <- 0
   v2 <- as.numeric(v2)
   vec[i1] <- vec[i1][order(v1, v2)]
   vec
   
}

-testing

f1(z)
[1] "C"   "C1"  "C2"  "C11" "C50" "A"   "A5"  NA    "B"   "B1"  "B2"  NA    "B11" "B21" NA    NA    "B22" NA   
like image 162
akrun Avatar answered Sep 23 '22 02:09

akrun