Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Frequent words in Python

How can I write a code to find the most frequent 2-mer of "GATCCAGATCCCCATAC". I have written this code but it seems that I am wrong, please help in correcting me.

def PatternCount(Pattern, Text):
    count = 0
    for i in range(len(Text)-len(Pattern)+1):
        if Text[i:i+len(Pattern)] == Pattern:
            count = count+1
    return count

This code prints the most frequent k-mer in a string but it don't give me the 2-mer in the given string.

like image 771
shahzad fida Avatar asked Dec 11 '22 13:12

shahzad fida


2 Answers

You can first define a function to get all the k-mer in your string :

def get_all_k_mer(string, k=1):
   length = len(string)
   return [string[i: i+ k] for i in xrange(length-k+1)]

Then you can use collections.Counter to count the repetition of each k-mer:

>>> from collections import Counter
>>> s = 'GATCCAGATCCCCATAC'
>>> Counter(get_all_k_mer(s, k=2))

Ouput :

Counter({'AC': 1,
         'AG': 1,
         'AT': 3,
         'CA': 2,
         'CC': 4,
         'GA': 2,
         'TA': 1,
         'TC': 2})

Another example :

>>> s = "AAAAAA"
>>> Counter(get_all_k_mer(s, k=3))

Output :

Counter({'AAA': 4})
# Indeed : AAAAAA
           ^^^     -> 1st time
            ^^^    -> 2nd time
             ^^^   -> 3rd time
               ^^^ -> 4th time
like image 56
MMF Avatar answered Dec 24 '22 06:12

MMF


In general, when I want to count things with python I use a Counter

from itertools import tee
from collections import Counter

dna = "GATCCAGATCCCCATAC"
a, b = tee(iter(dna), 2)
_ = next(b)
c = Counter(''.join(l) for l in zip(a,b))
print(c.most_common(1))

This prints [('CC', 4)], a list of the 1 most common 2-mers in a tuple with their count in the string.

In fact, we can generalize this to the find the most common n-mer for a given n.

from itertools import tee, islice
from collections import Counter

def nmer(dna, n):
    iters = tee(iter(dna), n)
    iters = [islice(it, i, None) for i, it in enumerate(iters)]
    c = Counter(''.join(l) for l in zip(*iters))
    return c.most_common(1)
like image 43
Patrick Haugh Avatar answered Dec 24 '22 08:12

Patrick Haugh