Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merge (opposite of split) pair of rows in r

Tags:

merge

r

collapse

I have the column like the following. Each column has two pairs each with suffix "a" and "b" - for example col1a, col1b, colNa, colNb and so on till end of the file (> 50000).

mydataf <- data.frame (Ind = 1:5, col1a = sample (c(1:3), 5, replace = T), 
   col1b = sample (c(1:3), 5, replace = T),  colNa = sample (c(1:3), 5, replace = T),
   colNb = sample (c(1:3),5, replace = T),
     K_a = sample (c("A", "B"),5, replace = T),  
    K_b = sample (c("A", "B"),5, replace = T))

mydataf 
   Ind col1a col1b colNa colNb K_a K_b
1   1     1     1     2     3   B   A
2   2     1     3     2     2   B   B
3   3     2     1     1     1   B   B
4   4     3     1     1     3   A   B
5   5     1     1     3     2   B   A

Except First column (Ind), I want to collapse the pair of rows to make the dataframe look like the following, at the sametime the suffix "a" and "b" be removed. Also merged characters or number be ordered 1 first that 2, A first than B

   Ind col1   colN  K_
    1   11     23   AB   
    2   13     22   BB
    3   12     11   BB
    4   13     13   AB
    5   11     23   AB   

Edit: The grep function (probably) in the answer has problem if the name of columns are similar.

mydataf <- data.frame (col_1_a = sample (c(1:3), 5, replace = T),
   col_1_b = sample (c(1:3), 5, replace = T),  col_1_Na = sample (c(1:3), 5, replace = T),
   col_1_Nb = sample (c(1:3),5, replace = T),
     K_a = sample (c("A", "B"),5, replace = T),
    K_b = sample (c("A", "B"),5, replace = T))
n <- names(mydataf)
nm <- c(unique(substr(n, 1, nchar(n)-1)))
df <- data.frame(sapply(nm, function(x){
                             idx <- grep(x, n)
                             cols <- mydataf[idx]
                             x <- apply(cols, 1,
                                       function(z) paste(sort(z), collapse = ""))
                             return(x)
                            }))
names(df) <- nm
df

 col_1_ col_1_N K_
1   2233      23 BB
2   2233      22 BB
3   1123      13 AB
4   1223      12 AB
5   2333      33 AB
like image 484
shNIL Avatar asked Jul 27 '12 20:07

shNIL


1 Answers

mydataf
  Ind col1a col1b colNa colNb K_a K_b
1   1     2     1     1     1   A   A
2   2     1     2     1     3   B   A
3   3     1     2     3     2   A   A
4   4     1     2     3     1   A   B
5   5     1     2     2     1   A   A
n <- names(mydataf)
nm <- c("Ind", unique(substr(n, 1, nchar(n)-1)[-1]))
df <- data.frame(sapply(nm, function(x){
                             idx <- grep(paste0(x, "[ab]?$"), n)
                             cols <- mydataf[idx]
                             x <- apply(cols, 1, 
                                       function(z) paste(sort(z), collapse = ""))
                             return(x)
                            }))
names(df) <- nm
df
  Ind col1 colN K_
1   1   12   11 AA
2   2   12   13 AB
3   3   12   23 AA
4   4   12   13 AB
5   5   12   12 AA
like image 72
Julius Vainora Avatar answered Oct 01 '22 09:10

Julius Vainora