Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

order while splitting (eg. TA should be split to two column "A" in first "T" second) in r

I have following issue, I could solve:

set.seed (1234)
mydf <- data.frame (var1a = sample (c("TA", "AA", "TT"), 5, replace = TRUE),
                    varb2 = sample (c("GA", "AA", "GG"), 5, replace = TRUE),
                    varAB = sample (c("AC", "AA", "CC"), 5, replace = TRUE)
                    )
     mydf 

  var1a varb2 varAB
1    TA    AA    CC
2    AA    GA    AA
3    AA    GA    AC
4    AA    AA    CC
5    TT    AA    AC

I want to split two letter into different column, and then order alphabetically.

Edit: Ordering can be done before split, for example var1a value "TA" var1a should be "AT" or after split so that var1aa should be "A", and var1ab be "T" (instead of "T", "A"). so sorting is within each cell.

split_col <- function(.col, data){
    .x <- colsplit( data[[.col]], names =  paste0(.col, letters[1:2]))
   }

split each column and combine

    require(reshape)
    splitdf <- do.call(cbind, lapply(names(mydf), split_col, data = mydf))

 var1aa var1ab varb2a varb2b varABa varABb
1      T      A      A      A      C      C
2      A      A      G      A      A      A
3      A      A      G      A      A      C
4      A      A      A      A      C      C
5      T      T      A      A      A      C

But the unsolved part is I want to order the pair of columns such that columnname"a" and columname"b" are ordered, alphabetically. Thus expected output:

    var1aa var1ab varb2a varb2b varABa varABb
1      A      T      A      A      C      C
2      A      A      A      G      A      A
3      A      A      A      G      A      C
4      A      A      A      A      C      C
5      T      T      A      A      A      C

Can how can order (short with each pair of variable) ?

like image 794
shNIL Avatar asked Jul 27 '12 23:07

shNIL


1 Answers

mylist <-as.list(mydf)

splits <- lapply(mylist, reshape::colsplit, names=c("a", "b"))
rowsort <- lapply(splits, function(x) t(apply(x, 1, sort)))
comb <- do.call(data.frame, rowsort)
comb

  var1a.1 var1a.2 varb2.1 varb2.2 varAB.a varAB.b
1       A       T       A       A       C       C
2       A       A       A       G       A       A
3       A       A       A       G       A       C
4       A       A       A       A       C       C
5       T       T       A       A       A       C

EDIT: If names are important, you can replace them:

replaceNums <- function(x){
  .which <- regmatches(x, regexpr("[[:alnum:]]*(?=.)", x, perl=TRUE))
  stopifnot(length(x) %% 2 == 0) #checkstep
  paste0(.which, c("a", "b"))
}

names(comb) <- replaceNums(names(comb))
comb
  var1aa var1ab varb2a varb2b varABa varABb
1      A      T      A      A      C      C
2      A      A      A      G      A      A
3      A      A      A      G      A      C
4      A      A      A      A      C      C
5      T      T      A      A      A      C
like image 72
sebastian-c Avatar answered Sep 22 '22 01:09

sebastian-c