I am able to download a FASTA file manually that looks like:
>lcl|CR543861.1_gene_1...
ATGCTTTGGACA...
>lcl|CR543861.1_gene_2...
GTGCGACTAAAA...
by clicking "Send to" and selecting "Gene Features", FASTA Nucleotide is the only option (which is fine because that's all I want) on this page.
With a script like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Bio::DB::EUtilities;
my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
-db => 'nucleotide',
-id => 'CR543861',
-rettype => 'fasta');
my $file = 'CR543861.fasta';
$factory->get_Response(-file => $file);
I get a file that looks like:
>gi|49529273|emb|CR543861.1| Acinetobacter sp. ADP1 complete genome
GATATTTTATCCACA...
with the whole genomic sequence lumped together. How do I get information like in the first (manually downloaded) file?
I looked at a couple of other posts:
As well as this section from EUtilities Cookbook.
I tried fetching and saving a GenBank file (since it seems to have separate sequences for each gene in the .gb file I get), but when I go work with it using Bio::SeqIO, I will get only 1 large sequence.
With that accession number and return type, you are getting the complete genome sequence. If you want to get the individual gene sequences, specify that you want the complete genbank file, then parse out the genes. Here is an example:
#!/usr/bin/env perl
use 5.010;
use strict;
use warnings;
use Bio::SeqIO;
use Bio::DB::EUtilities;
my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
-email => '[email protected]',
-db => 'nucleotide',
-id => 'CR543861',
-rettype => 'gb');
my $file = 'CR543861.gb';
$factory->get_Response(-file => $file);
my @gene_features = grep { $_->primary_tag eq 'gene' }
Bio::SeqIO->new(-file => $file)->next_seq->get_SeqFeatures;
for my $feat_object (@gene_features) {
for my $tag ($feat_object->get_all_tags) {
# open a filehandle here for writing each to a separate file
say ">",$feat_object->get_tag_values($tag);
say $feat_object->spliced_seq->seq;
# close it!
}
}
This will write each gene to the same file (if you redirect it, now it just writes to STDOUT) but I indicated where you could make a small change to write them to separate files. Parsing genbank can be a bit tricky at times, so it is always helpful to read the docs and in particular, the excellent Feature Annotation HOWTO.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With