Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting DNA Nucleotides using perl 6

Good Afternoon, i am trying to count the number of times the letters A C T G occur in DNA sequence using perl6.i have tried other ways i am just trying to get it done in another way. Here are some of the code i came up with

use v6;

my $default-input = "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC";

sub MAIN(Str $input = $default-input) 
{
    say "{bag($input.comb)<A C G T>}";
}



use v6;

my $default-input = "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC";

sub MAIN($input = $default-input) 
{
    "{<A C G T>.map({ +$input.comb(/$_/) })}".say;

Sample Dataset
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC

like image 721
Oluwole Avatar asked May 04 '16 19:05

Oluwole


2 Answers

multi sub MAIN ( \DNA ) {
  my Int %bag = A => 0, C => 0, G => 0, T => 0;

  # doesn't keep the whole thing in memory
  # like .comb.Bag would have
  for DNA.comb {
    %bag{$_}++
  }
  .say for %bag<A C G T> :p;
}

multi sub MAIN ( 'example' ){
  samewith "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC"
}

multi sub MAIN ( Bool :STDIN($)! ){
  samewith $*IN
}

multi sub MAIN ( Str :filename(:$file)! where .IO.f ){
  samewith $file.IO
}
~$ ./test.p6
Usage:
  ./test.p6 <DNA> 
  ./test.p6 example 
  ./test.p6 --STDIN 
  ./test.p6 --filename|--file=<Str>

~$ ./test.p6 example
A => 20
C => 12
G => 17
T => 21

~$ ./test.p6 --STDIN < test.in
A => 20
C => 12
G => 17
T => 21

~$ ./test.p6 --file=test.in
A => 20
C => 12
G => 17
T => 21
like image 168
Brad Gilbert Avatar answered Sep 20 '22 00:09

Brad Gilbert


Another way is to use the BioInfo modules I'm working on which have a coercion to Bag already for you :)

use v6;
use BioInfo;

my @sequences = `
>seqid
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
`;

for @sequences -> $seq {
    say $seq.Bag;
}

In the above code you are importing a special bioinformatics Slang which understands that string literals between `` are FASTA literals. DNA/RNA/Amino-acids are automatically detected and you get a specific class for that. The object has its own .Bag that does what you want. Other than my own modules there is also the BioPerl6 project.

If you want to read from file then the following should work for you:

use v6;
use BioInfo::Parser::FASTA;
use BioInfo::IO::FileParser;

#Spawn an IO thread that parses the file and creates BioInfo::Seq objects on .get
my $seq_file = BioInfo::IO::FileParser.new(file => 'myseqs.fa', parser => BioInfo::Parser::FASTA);

#Print the residue counts per file
while my $seq = $seq_file.get() {
    say $seq.Bag;
}
like image 35
Matt Oates Avatar answered Sep 20 '22 00:09

Matt Oates