I have a script that performs BLAST queries (bl2seq)
The script works like this:
- Get sequence a, sequence b
- write sequence a to filea
- write sequence b to fileb
- run command 'bl2seq -i filea -j fileb -n blastn'
- get output from STDOUT, parse
- repeat 20 million times
The program bl2seq does not support piping. Is there any way to do this and avoid writing/reading to the harddrive?
I'm using Python BTW.
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between protein or nucleotide sequences. The program compares nucleotide or protein sequences to sequence in a database and calculates the statistical significance of the matches.
The query sequence(s) to be used for a BLAST search should be pasted in the 'Search' text area. BLAST accepts a number of different types of input and automatically determines the format or the input.
Depending on what OS you're running on, you may be able to use something like bash's process substitution. I'm not sure how you'd set that up in Python, but you're basically using a named pipe (or named file descriptor). That won't work if bl2seq
tries to seek within the files, but it should work if it just reads them sequentially.
How do you know bl2seq does not support piping.? By the way, pipes is an OS feature, not the program. If your bl2seq program outputs something, whether to STDOUT or to a file, you should be able to parse the output. Check the help file of bl2seq for options to output to file as well, eg -o
option. Then you can parse the file.
Also, since you are using Python, an alternative you can use is BioPython module.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With