How to perform basic Multiple Sequence Alignments in R?

Tags:

(I've tried asking this on BioStars, but for the slight chance that someone from text mining would think there is a better solution, I am also reposting this here)

The task I'm trying to achieve is to align several sequences.

I don't have a basic pattern to match to. All that I know is that the "True" pattern should be of length "30" and that the sequences I have had missing values introduced to them at random points.

Here is an example of such sequences, were on the left we see what is the real location of the missing values, and on the right we see the sequence that we will be able to observe.

My goal is to reconstruct the left column using only the sequences I've got on the right column (based on the fact that many of the letters in each position are the same)

Click to copy

                     Real_sequence           The_sequence_we_see
1   CGCAATACTAAC-AGCTGACTTACGCACCG CGCAATACTAACAGCTGACTTACGCACCG
2   CGCAATACTAGC-AGGTGACTTCC-CT-CG   CGCAATACTAGCAGGTGACTTCCCTCG
3   CGCAATGATCAC--GGTGGCTCCCGGTGCG  CGCAATGATCACGGTGGCTCCCGGTGCG
4   CGCAATACTAACCA-CTAACT--CGCTGCG   CGCAATACTAACCACTAACTCGCTGCG
5   CGCACGGGTAAGAACGTGA-TTACGCTCAG CGCACGGGTAAGAACGTGATTACGCTCAG
6   CGCTATACTAACAA-GTG-CTTAGGC-CTG   CGCTATACTAACAAGTGCTTAGGCCTG
7   CCCA-C-CTAA-ACGGTGACTTACGCTCCG   CCCACCTAAACGGTGACTTACGCTCCG

Here is an example code to reproduce the above example:

Click to copy

ATCG <- c("A","T","C","G")
set.seed(40)
original.seq <- sample(ATCG, 30, T)
seqS <- matrix(original.seq,200,30, T)
change.letters <- function(x, number.of.changes = 15, letters.to.change.with = ATCG) 
{
    number.of.changes <- sample(seq_len(number.of.changes), 1)
    new.letters <- sample(letters.to.change.with , number.of.changes, T)
    where.to.change.the.letters <- sample(seq_along(x) , number.of.changes, F)
    x[where.to.change.the.letters] <- new.letters
    return(x)
}
change.letters(original.seq)
insert.missing.values <- function(x) change.letters(x, 3, "-") 
insert.missing.values(original.seq)

seqS2 <- t(apply(seqS, 1, change.letters))
seqS3 <- t(apply(seqS2, 1, insert.missing.values))

seqS4 <- apply(seqS3,1, function(x) {paste(x, collapse = "")})
require(stringr)
# library(help=stringr)
all.seqS <- str_replace(seqS4,"-" , "")

# how do we allign this?
data.frame(Real_sequence = seqS4, The_sequence_we_see = all.seqS)

I understand that if all I had was a string and a pattern I would be able to use

Click to copy

library(Biostrings)
pairwiseAlignment(...)

But in the case I present we are dealing with many sequences to align to one another (instead of aligning them to one pattern).

Is there a known method for doing this in R?

575

asked Dec 21 '10 09:12

Tal Galili

1 Answers

You can perform multiple alignment in R with the DECIPHER package.

Following your example, it would look something like:

Click to copy

library(DECIPHER)
dna <- DNAStringSet(all.seqS)
aligned_DNA <- AlignSeqs(dna)

It is fast and at least as accurate as the other methods listed here (see the paper). I hope that helps!

140

answered Dec 28 '22 15:12

Erik Wright

Related questions
                            
                                r check if package version is greater than x.y.z
                            
                                Shiny only calculating when the user is looking at the output
                            
                                Select a sequence of columns: `:` works but not `seq`
                            
                                How to use size and decay in nnet
                            
                                Condition ( | ) in R formula
                            
                                Writing to file with xtable in R
                            
                                How to see all rows of a data frame in a Jupyter notebook with an R kernel?
                            
                                How to change font size of table in Rmarkdown, LaTeX and .pdf?
                            
                                How to export multiple function or packages in foreach loop in "R"
                            
                                Weekly forecasts with holidays
                            
                                reshape wide to long using data.table with multiple columns
                            
                                r dplyr filter with a dynamic variable name
                            
                                How to set the color of se(confidence interval) of geom_smooth in ggplot2? [duplicate]
                            
                                Creating alternate series in r
                            
                                How to filter out NULL elements of tibble's list column
                            
                                Regex to remove leading zeros in R, unless the final (or only) character is zero
                            
                                Converting a deeply nested list to a dataframe
                            
                                How to do rolling sum over columns in R?
                            
                                breaking out of for loop when running a function inside a for loop in R
                            
                                Accessing Arbitrary Columns from an R Data Frame using with()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to perform basic Multiple Sequence Alignments in R?

Tags:

alignment

r

text-alignment

sequence

bioinformatics

Tal Galili

People also ask

1 Answers

Erik Wright

Recent Activity

Donate For Us