How to iterator over every [:2] overlapping characters in a string of DNA code?

Question

Let's say I have a string of DNA 'GAAGGAGCGGCGCCCAAGCTGAGATAGCGGCTAGAGGCGGGTAACCGGCA'

Consider the first 5 letters: GAAGG

And I want to replace each overlapping bi-gram 'GA','AA','AG','GG' with some number that corresponds to their likelihood of occurrence, summing them. Like 'GA' = 1, 'AA' = 2, 'AG' = .7, 'GG' = .5. So for GAAGG I would have my sumAnswer = 1 + 2 + .7 + 5.

So in pseduo code, I want to... -iterate over each overlapping bi-gram in my DNA string -find the corresponding value to each unique bi-gram pair -sum each value iteratively

I'm not enitrely sure how to iterate over each pair. I thought a for loop would work, but that doesn't account for the overlap: it prints every 2-pair (GAGC = GA,GC), not every overlapping 2-pair (GAGC = GA,AG,GC)

for i in range(0, len(input), 2):
      print input[i:i+2]

Any tips?

lvc · Accepted Answer

Forget playing with range and index arithmetic, iterating over pairs is exactly what zip is for:

>>> dna = 'GAAGG'
>>> for bigram in zip(dna, dna[1:]):
...    print(bigram)
... 
('G', 'A')
('A', 'A')
('A', 'G')
('G', 'G')

If you have the corresponding likelihoods stored in a dictionary, like so:

likelihood = {
   'GA': 1, 
   'AA': 2,
   'AG': .7, 
   'GG': .5
}

then you can sum them quite easily with the unsurprisingly named sum:

>>> sum(likelihood[''.join(bigram)] for bigram in zip(dna,dna[1:]))
4.2

How to iterator over every [:2] overlapping characters in a string of DNA code?

Tags:

python

iterator

string

for-loop

n-gram

bambo222

1 Answers

lvc

Recent Activity

Donate For Us

How to iterator over every [:2] overlapping characters in a string of DNA code?

Tags:

python

iterator

string

for-loop

n-gram

bambo222

1 Answers

lvc

Related questions

Recent Activity

Donate For Us