Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Errors with the align_local function in R

I am trying to compare two gene sequences:

sequence_1 <- "MPHLENVVLCRESQVSILQSLFGERHHFSFPSIFIYGHTASGKTYVTQTLLKTLELPHVFVNCVECFTLRLLLEQILNKLNHLSSSEDGCSTEITCETFNDFVRLFKQVTTAENLKDQTVYIVLDKAEYLRDMEANLLPGFLRLQELADRNVTVLFLSEIVWEKFRPNTGCFEPFVLYFPDYSIGNLQKILSHDHPPEYSADFYAAYINILLGVFYTVCRDLKELRHLAVLNFPKYCEPVVKGEASERDTRKLWRNIEPHLKKAMQTVYLREISSSQWEKLQKDDTDPGQLKGLSAHTHVELPYYSKFILIAAYLASYNPARTDKRFFLKHHGKIKKTNFLKKHEKTSNHLLGPKPFPLDRLLAILYSIVDSRVAPTANIFSQITSLVTLQLLTLVGHDDQLDGPKYKCTVSLDFIRAIARTVNFDIIKYLYDFL"

sequence_2 <- "MEEEAPRFNVLEEAFNGNGNGCANVEATQSAILKVLTRVNRFQMRVRKHIEDNYTEFLPNNTSPDIFLEESGSLNREIHDMLENLGSEGLDALDEANVKMAGNGRQLREILLGLGVSEHVLRIDELFQCVEEAKATKDYLVLLDLVGRLRAFIYGDDSVDGDAQVATPEVRRIFKALECYETIKVKYHVQAYMLQQSLQERFDRLVQLQCKSFPTSRCVTLQVSRDQTQLQDIVQALFQEPYNPARLCEFLLDNCIEPVIMRPVMADYSEEADGGTYVRLSLSYATKEPSSAHVRPNYKQVLENLRLLLHTLAGINCSVSRDQHVFGIIGDHVKDKMLKLLVDECLIPAVPESTEEYQTSTLCEDVAQLEQLLVDSFIINPEQDRALGQFVEKYETYYRNRMYRRVLETAREIIQRDLQDMVLVAPNNHSAEVANDPFLFPRCMISKSAQDFVKLMDRILRQPTDKLGDQEADPIAGVISIMLHTYINEVPKVHRKLLESIPQQAVLFHNNCMFFTHWVAQHANKGIESLAALAKTLQATGQQHFRVQVDYQSSILMGIMQEFEFESTHTLGSGPLKLVRQCLRQLELLKNVWANVLPETVYNATFCELINTFVAELIRRVFTLRDISAQMACELSDLIDVVLQRAPTLFREPNEVVQVLSWLKLQQLKAMLNASLMEITELWGDGVGPLTASYKSDEIKHLIRALFQDTDWRAKAITQIV"

using the align_local function from the textreuse package. My input is:

library(textreuse)
align_local(sequence_1, sequence_2)

and I get the error:

Error in b_out[out_i] <- b_orig[row_i - 1] : replacement has length zero
In addition: Warning message:
Multiple optimal local alignments found; selecting only one of them. 

I've tried tinkering with the alignment score and the mismatch score, but to no avail. Any advice would be appreciated.

like image 736
Murph Avatar asked Nov 25 '25 09:11

Murph


1 Answers

The textreuse package is intended for natural language. Under no circumstances should you use it for aligning gene sequences. (I am the package author.) You probably want the Biostrings package from Bioconductor.

The problem is that the align_local() function expects there to be multiple words, as indicated by spaces or punctuation, because it aligns word by word not character by character. The function would work if you put spaces between the bases in your gene sequence. But I'm not going to explain how to do that because, again, you should not be using a natural language package for aligning genes.

like image 154
Lincoln Mullen Avatar answered Nov 27 '25 23:11

Lincoln Mullen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!