I have a data frame of amino acid sites, and want to create a new data frame of each pairwise combination of these sites.
The original data will look something like this:
df<-cbind(letters[1:5], letters[6:10], letters[11:15])
df
[,1] [,2] [,3]
[1,] "a" "f" "k"
[2,] "b" "g" "l"
[3,] "c" "h" "m"
[4,] "d" "i" "n"
[5,] "e" "j" "o"
And what I would like is this:
newdf<-cbind(paste(df[,1],df[,2],sep=""),paste(df[,1],df[,3],sep=""),(paste(df[,2],df[,3],sep="")))
newdf
[,1] [,2] [,3]
[1,] "af" "ak" "fk"
[2,] "bg" "bl" "gl"
[3,] "ch" "cm" "hm"
[4,] "di" "dn" "in"
[5,] "ej" "eo" "jo"
The actual data may have hundreds of rows and/or columns, so obviously I need a less manual way of doing this. Any help is much appreciated, I am but a humble biologist and my skill set in this area is rather limited.
paste () in R 1 The paste () function concatenates the vectors or strings. 2 The paste () function concatenates the columns of the data frame. 3 The paste0 () function concatenates the vectors or strings without any separator. 4 The paste0 () function concatenates the columns of the data frame. More items...
Let’s see how to Concatenate two columns of dataframe in R. Concatenate numeric and string column in R. Concatenate two columns by removing leading and trailing space. merge or concatenate two or more columns in R using str_c () and unite () function. Let’s first create the dataframe.
To combine two columns in R data frame with comma separation, we can use paste function with sep argument. For Example, if we have two columns say X and Y in a data frame called df then we can combine the values of these two columns in a new column with comma separation by using the below mentioned command −
The paste () method takes three parameters, and returns concatenated string. The paste0 () function in R concatenates the vector without any separator. The paste () function concatenates the vectors or strings. The paste () function concatenates the columns of the data frame.
A combination of combn()
and apply()
will get you all of the unordered pairwise combos:
df <- cbind(letters[1:5], letters[6:10], letters[11:15])
apply(X = combn(seq_len(ncol(df)), 2),
MAR = 2,
FUN = function(jj) {
apply(df[, jj], 1, paste, collapse="")
}
)
# [,1] [,2] [,3]
# [1,] "af" "ak" "fk"
# [2,] "bg" "bl" "gl"
# [3,] "ch" "cm" "hm"
# [4,] "di" "dn" "in"
# [5,] "ej" "eo" "jo"
(If what's going on in the above isn't immediately clear, you might want to have a quick look at the object returned by combn(seq_len(ncol(df)), 2)
. Its columns enumerate all unordered pairwise combos integers between 1 and n
, where n
is the number of columns in your data frame.)
You can use the FUN
argument to combn
to paste together the columns from each combination:
combn(ncol(df),2,FUN=function(i) apply(df[,i],1,paste0,collapse=""))
Josh and Joshua's answers are better but I thought I'd give my approach:
This requires downloading qdap
varsion 1.1.0 using the paste2
function:
library(qdap)
ind <- unique(t(apply(expand.grid(1:3, 1:3), 1, sort)))
ind <- ind[ind[, 1] != ind[, 2], ]
sapply(1:nrow(ind), function(i) paste2(df[, unlist(ind[i, ])], sep=""))
Though to steal from their answers this would be much more readable:
ind <- t(combn(seq_len(ncol(df)), 2))
sapply(1:nrow(ind), function(i) paste2(df[, unlist(ind[i, ])], sep=""))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With