Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating all DNA kmers with Python

I'm having a little bit of trouble with this python code.

I'm rather new to python and I would like to generate all possible DNA kmers of length k and add them to a list, however I cannot think of an elegant way to do it! Below is what I have for a kmer of length 8. Any suggestions would be very helpful.

bases=['A','T','G','C']
kmer=list()

for i in bases:
    for j in bases:
        for k in bases:
            for l in bases:
                for m in bases:
                    for n in bases:
                        for o in bases:
                            for p in bases:
                                kmer.append(i+j+k+l+m+n+o+p)
like image 694
GGDRGC Avatar asked Sep 19 '14 21:09

GGDRGC


People also ask

How do you calculate Kmers in Python?

find the length of the string L, and use L and k to work out how many k-mers there are in the string. loop over all possible starting positions of the k-mers. at each position, extract the k-mer by taking a slice of the string. if the k-mer is not already in the dictionary, add it with a count of 0.

What is kmer in python?

k-mers provide sensitive and specific methods for comparing and analyzing genomes. This notebook provides pure Python implementations of some of the basic k-mer comparison techniques implemented in sourmash, including hash-based subsampling techniques.


1 Answers

In [58]: bases=['A','T','G','C']

In [59]: k = 2

In [60]: [''.join(p) for p in itertools.product(bases, repeat=k)]
Out[60]: ['AA', 'AT', 'AG', 'AC', 'TA', 'TT', 'TG', 'TC', 'GA', 'GT', 'GG', 'GC', 'CA', 'CT', 'CG', 'CC']

In [61]: k = 3

In [62]: [''.join(p) for p in itertools.product(bases, repeat=k)]
Out[62]: ['AAA', 'AAT', 'AAG', 'AAC', 'ATA', 'ATT', 'ATG', 'ATC', 'AGA', 'AGT', 'AGG', 'AGC', 'ACA', 'ACT', 'ACG', 'ACC', 'TAA', 'TAT', 'TAG', 'TAC', 'TTA', 'TTT', 'TTG', 'TTC', 'TGA', 'TGT', 'TGG', 'TGC', 'TCA', 'TCT', 'TCG', 'TCC', 'GAA', 'GAT', 'GAG', 'GAC', 'GTA', 'GTT', 'GTG', 'GTC', 'GGA', 'GGT', 'GGG', 'GGC', 'GCA', 'GCT', 'GCG', 'GCC', 'CAA', 'CAT', 'CAG', 'CAC', 'CTA', 'CTT', 'CTG', 'CTC', 'CGA', 'CGT', 'CGG', 'CGC', 'CCA', 'CCT', 'CCG', 'CCC']
like image 118
inspectorG4dget Avatar answered Oct 27 '22 05:10

inspectorG4dget