I have a data frame of amino acid sites, and want to create a new data frame of each pairwise combination of these sites. The original data will look something like this: <pre class="prettyprint"><code>df<-cbind(letters[1:5], letters[6:10], letters[11:15]) df [,1] [,2] [,3] [1,] "a" "f" "k" [2,] "b" "g" "l" [3,] "c" "h" "m" [4,] "d" "i" "n" [5,] "e" "j" "o" </code></pre> And what I would like is this: <pre class="prettyprint"><code>newdf<-cbind(paste(df[,1],df[,2],sep=""),paste(df[,1],df[,3],sep=""),(paste(df[,2],df[,3],sep=""))) newdf [,1] [,2] [,3] [1,] "af" "ak" "fk" [2,] "bg" "bl" "gl" [3,] "ch" "cm" "hm" [4,] "di" "dn" "in" [5,] "ej" "eo" "jo" </code></pre> The actual data may have hundreds of rows and/or columns, so obviously I need a less manual way of doing this. Any help is much appreciated, I am but a humble biologist and my skill set in this area is rather limited.

A combination of <code>combn()</code> and <code>apply()</code> will get you all of the unordered pairwise combos: <pre class="prettyprint"><code>df <- cbind(letters[1:5], letters[6:10], letters[11:15]) apply(X = combn(seq_len(ncol(df)), 2), MAR = 2, FUN = function(jj) { apply(df[, jj], 1, paste, collapse="") } ) # [,1] [,2] [,3] # [1,] "af" "ak" "fk" # [2,] "bg" "bl" "gl" # [3,] "ch" "cm" "hm" # [4,] "di" "dn" "in" # [5,] "ej" "eo" "jo" </code></pre> (If what's going on in the above isn't immediately clear, you might want to have a quick look at the object returned by <code>combn(seq_len(ncol(df)), 2)</code>. Its columns enumerate all unordered pairwise combos integers between 1 and <code>n</code>, where <code>n</code> is the number of columns in your data frame.)

You can use the <code>FUN</code> argument to <code>combn</code> to paste together the columns from each combination: <pre class="prettyprint"><code>combn(ncol(df),2,FUN=function(i) apply(df[,i],1,paste0,collapse="")) </code></pre>

Josh and Joshua's answers are better but I thought I'd give my approach: This requires downloading <code>qdap</code> varsion 1.1.0 using the <code>paste2</code> function: <pre class="prettyprint"><code>library(qdap) ind <- unique(t(apply(expand.grid(1:3, 1:3), 1, sort))) ind <- ind[ind[, 1] != ind[, 2], ] sapply(1:nrow(ind), function(i) paste2(df[, unlist(ind[i, ])], sep="")) </code></pre> Though to steal from their answers this would be much more readable: <pre class="prettyprint"><code>ind <- t(combn(seq_len(ncol(df)), 2)) sapply(1:nrow(ind), function(i) paste2(df[, unlist(ind[i, ])], sep="")) </code></pre>

Paste together each pair of columns in a data frame in R?

Tags:

r

I have a data frame of amino acid sites, and want to create a new data frame of each pairwise combination of these sites.

The original data will look something like this:

df<-cbind(letters[1:5], letters[6:10], letters[11:15])
df
 [,1] [,2] [,3] 
[1,] "a"  "f"  "k" 
[2,] "b"  "g"  "l" 
[3,] "c"  "h"  "m" 
[4,] "d"  "i"  "n" 
[5,] "e"  "j"  "o"

And what I would like is this:

newdf<-cbind(paste(df[,1],df[,2],sep=""),paste(df[,1],df[,3],sep=""),(paste(df[,2],df[,3],sep="")))
newdf
     [,1] [,2] [,3]
[1,] "af" "ak" "fk"
[2,] "bg" "bl" "gl"
[3,] "ch" "cm" "hm"
[4,] "di" "dn" "in"
[5,] "ej" "eo" "jo"

The actual data may have hundreds of rows and/or columns, so obviously I need a less manual way of doing this. Any help is much appreciated, I am but a humble biologist and my skill set in this area is rather limited.

603

asked Jul 30 '12 23:07

Jill Hollenbach

3 Answers

A combination of combn() and apply() will get you all of the unordered pairwise combos:

df <- cbind(letters[1:5], letters[6:10], letters[11:15])

apply(X = combn(seq_len(ncol(df)), 2), 
      MAR = 2, 
      FUN = function(jj) {
          apply(df[, jj], 1, paste, collapse="")
      }      
)
#      [,1] [,2] [,3]
# [1,] "af" "ak" "fk"
# [2,] "bg" "bl" "gl"
# [3,] "ch" "cm" "hm"
# [4,] "di" "dn" "in"
# [5,] "ej" "eo" "jo"

(If what's going on in the above isn't immediately clear, you might want to have a quick look at the object returned by combn(seq_len(ncol(df)), 2). Its columns enumerate all unordered pairwise combos integers between 1 and n, where n is the number of columns in your data frame.)

171

answered Sep 26 '22 02:09

Josh O'Brien

You can use the FUN argument to combn to paste together the columns from each combination:

combn(ncol(df),2,FUN=function(i) apply(df[,i],1,paste0,collapse=""))

answered Sep 26 '22 02:09

Joshua Ulrich

Josh and Joshua's answers are better but I thought I'd give my approach:

This requires downloading qdap varsion 1.1.0 using the paste2 function:

library(qdap)

ind <- unique(t(apply(expand.grid(1:3, 1:3), 1, sort)))
ind <- ind[ind[, 1] != ind[, 2], ]
sapply(1:nrow(ind), function(i) paste2(df[, unlist(ind[i, ])], sep=""))

Though to steal from their answers this would be much more readable:

ind <- t(combn(seq_len(ncol(df)), 2))
sapply(1:nrow(ind), function(i) paste2(df[, unlist(ind[i, ])], sep=""))

answered Sep 27 '22 02:09

Tyler Rinker

Related questions
                            
                                Reading a file on a network in R
                            
                                Second Y-Axis in a R plotly graph
                            
                                How to use R ggplot stat_summary to plot median and quartiles?
                            
                                The difference of na.rm and na.omit in R
                            
                                Replace values in R, "Yes" to 1 and "No" to 0
                            
                                Sequential citation numbering in R: separate numbers by hyphen, if sequential - add comma if not
                            
                                Efficiently convert a date column in data.table
                            
                                Opposite of unnest_tokens
                            
                                Get the name of a list item created with purrr::map
                            
                                Atom editor r-language error - Failed to load snippets
                            
                                if command to test for integer(0)
                            
                                save an R dataframe with the name specified by a string
                            
                                XPT to CSV Conversion? [closed]
                            
                                Multiple density graphs different groups (based on factor level) using plyr
                            
                                How to create a datetime object from separate date fields?
                            
                                Legends in R plots
                            
                                Plotting data against time in R
                            
                                Subset R data frame contingent on the value of duplicate variables
                            
                                how to install R packages "RNetCDF" and "ncdf" on Ubuntu?
                            
                                Producing numeric sequences in R using standard patterns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With