Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Paste together each pair of columns in a data frame in R?

Tags:

r

I have a data frame of amino acid sites, and want to create a new data frame of each pairwise combination of these sites.

The original data will look something like this:

df<-cbind(letters[1:5], letters[6:10], letters[11:15])
df
 [,1] [,2] [,3] 
[1,] "a"  "f"  "k" 
[2,] "b"  "g"  "l" 
[3,] "c"  "h"  "m" 
[4,] "d"  "i"  "n" 
[5,] "e"  "j"  "o" 

And what I would like is this:

newdf<-cbind(paste(df[,1],df[,2],sep=""),paste(df[,1],df[,3],sep=""),(paste(df[,2],df[,3],sep="")))
newdf
     [,1] [,2] [,3]
[1,] "af" "ak" "fk"
[2,] "bg" "bl" "gl"
[3,] "ch" "cm" "hm"
[4,] "di" "dn" "in"
[5,] "ej" "eo" "jo"

The actual data may have hundreds of rows and/or columns, so obviously I need a less manual way of doing this. Any help is much appreciated, I am but a humble biologist and my skill set in this area is rather limited.

like image 603
Jill Hollenbach Avatar asked Jul 30 '12 23:07

Jill Hollenbach


People also ask

What is the use of Paste() in R?

paste () in R 1 The paste () function concatenates the vectors or strings. 2 The paste () function concatenates the columns of the data frame. 3 The paste0 () function concatenates the vectors or strings without any separator. 4 The paste0 () function concatenates the columns of the data frame. More items...

How to concatenate two columns of Dataframe in R?

Let’s see how to Concatenate two columns of dataframe in R. Concatenate numeric and string column in R. Concatenate two columns by removing leading and trailing space. merge or concatenate two or more columns in R using str_c () and unite () function. Let’s first create the dataframe.

How to combine two columns in R data frame with comma separation?

To combine two columns in R data frame with comma separation, we can use paste function with sep argument. For Example, if we have two columns say X and Y in a data frame called df then we can combine the values of these two columns in a new column with comma separation by using the below mentioned command −

How do you Paste a string in R with three parameters?

The paste () method takes three parameters, and returns concatenated string. The paste0 () function in R concatenates the vector without any separator. The paste () function concatenates the vectors or strings. The paste () function concatenates the columns of the data frame.


3 Answers

A combination of combn() and apply() will get you all of the unordered pairwise combos:

df <- cbind(letters[1:5], letters[6:10], letters[11:15])

apply(X = combn(seq_len(ncol(df)), 2), 
      MAR = 2, 
      FUN = function(jj) {
          apply(df[, jj], 1, paste, collapse="")
      }      
)
#      [,1] [,2] [,3]
# [1,] "af" "ak" "fk"
# [2,] "bg" "bl" "gl"
# [3,] "ch" "cm" "hm"
# [4,] "di" "dn" "in"
# [5,] "ej" "eo" "jo"

(If what's going on in the above isn't immediately clear, you might want to have a quick look at the object returned by combn(seq_len(ncol(df)), 2). Its columns enumerate all unordered pairwise combos integers between 1 and n, where n is the number of columns in your data frame.)

like image 171
Josh O'Brien Avatar answered Sep 26 '22 02:09

Josh O'Brien


You can use the FUN argument to combn to paste together the columns from each combination:

combn(ncol(df),2,FUN=function(i) apply(df[,i],1,paste0,collapse=""))
like image 27
Joshua Ulrich Avatar answered Sep 26 '22 02:09

Joshua Ulrich


Josh and Joshua's answers are better but I thought I'd give my approach:

This requires downloading qdap varsion 1.1.0 using the paste2 function:

library(qdap)

ind <- unique(t(apply(expand.grid(1:3, 1:3), 1, sort)))
ind <- ind[ind[, 1] != ind[, 2], ]
sapply(1:nrow(ind), function(i) paste2(df[, unlist(ind[i, ])], sep=""))

Though to steal from their answers this would be much more readable:

ind <- t(combn(seq_len(ncol(df)), 2))
sapply(1:nrow(ind), function(i) paste2(df[, unlist(ind[i, ])], sep=""))
like image 33
Tyler Rinker Avatar answered Sep 27 '22 02:09

Tyler Rinker