Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performing BLAST/SmithWaterman searches directly from my application

I'm working on a small application and thinking about integrating BLAST or other local alignment searches into my application. My searching has only brought up programs, which need to be installed and called as an external program.

Is there a way short of me implementing it from scratch? Any pre-made library perhaps?

like image 641
brandstaetter Avatar asked Sep 16 '09 11:09

brandstaetter


4 Answers

Does it have to be in C, or would C++ also be OK? If so, you might want to look at the SeqAn library here.

like image 158
PhiS Avatar answered Nov 06 '22 22:11

PhiS


This is a topic which has also to do with reproducibility of results: it is always better to use the raw blast binary provided by NCBI or UCSC, because it will make your results easeir to reproduce by other scientists and will save you a lot of time spent on writing tests (more time than you can imagine).

For the day-to-day work I have often used exonerate, a tool written in C which can do both global and local alignment, has a simple unix-like interface, and doesn't require to format your input as with blast.

Moreover, take in mind that people usually use a combination of makefiles and scripts to define a pipeline, instead of calling everything from a script: most programming languages are not good to define pipelines, while automated build tools like Make are not useful for scripting tasks. Have a look at these examples: http://skam.sourceforge.net/skam-intro.html http://swc.scipy.org/lec/build.html

like image 34
dalloliogm Avatar answered Nov 06 '22 21:11

dalloliogm


I just stumbled across the thing I would have wanted: The NCBI C++ Toolkit. Thanks for all the suggestions though.

like image 32
brandstaetter Avatar answered Nov 06 '22 23:11

brandstaetter


The BLAST algorithm was implemented ~20 years ago, it is now a very big algorithm and I cannot imagine it can be easily implemented from scratch. You can try to learn about it when looking at the sources of the 'blastall' program in the NCBI toolkit. A simpler pairwise algorithm (Swith Waterman, Needleman-Wunsch )should be easier to implement:

like image 34
Pierre Avatar answered Nov 06 '22 23:11

Pierre