Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating Synthetic DNA Sequence with Substitution Rate

Given these inputs:

my $init_seq = "AAAAAAAAAA" #length 10 bp 
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );

I want to generate:

  1. One thousand length-10 tags

  2. Substitution rate for each position in a tag is 0.003

Yielding output like:

AAAAAAAAAA
AATAACAAAA
.....
AAGGAAAAGA # 1000th tags

Is there a compact way to do it in Perl?

I am stuck with the logic of this script as core:

#!/usr/bin/perl

my $init_seq = "AAAAAAAAAA" #length 10 bp 
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );

    $i = 0;
    while ($i < length($init_seq)) {
        $roll = int(rand 4) + 1;       # $roll is now an integer between 1 and 4

        if ($roll == 1) {$base = A;}
        elsif ($roll == 2) {$base = T;}
        elsif ($roll == 3) {$base = C;}
        elsif ($roll == 4) {$base = G;};

        print $base;
    }
    continue {
        $i++;
    }
like image 228
neversaint Avatar asked Mar 02 '09 09:03

neversaint


People also ask

How can synthetic DNA be produced?

(From top, clockwise) Synthetic DNA constructs are designed and manipulated using computer-aided design software. The designed DNA is then divided into synthesizable pieces (synthons) up to 1–1.5 kbp. The synthons are then broken up into overlapping single-stranded oligonucleotide sequences and chemically synthesized.

Which of the following methods used synthesize desired DNA sequence in vitro?

Polymerase chain reaction, or PCR, is a technique to make many copies of a specific DNA region in vitro (in a test tube rather than an organism).

How do you synthesize DNA?

DNA synthesis requires a primer usually made of RNA. A primase synthesizes the ribonucleotide primer ranging from 4 to 12 nucleotides in length. DNA polymerase then incorporates a dNMP onto the 3' end of the primer initiating leading strand synthesis.

What are the latest methods of DNA sequencing?

There are two main types of DNA sequencing. The older, classical chain termination method is also called the Sanger method. Newer methods that can process a large number of DNA molecules quickly are collectively called High-Throughput Sequencing (HTS) techniques or Next-Generation Sequencing (NGS) methods.


1 Answers

As a small optimisation, replace:

    $roll = int(rand 4) + 1;       # $roll is now an integer between 1 and 4

    if ($roll == 1) {$base = A;}
    elsif ($roll == 2) {$base = T;}
    elsif ($roll == 3) {$base = C;}
    elsif ($roll == 4) {$base = G;};

with

    $base = $dna[int(rand 4)];
like image 180
Penfold Avatar answered Sep 19 '22 10:09

Penfold