Frequencies not adding up to one

Question

I am writing a function that is supposed to go through a .fasta file of DNA sequences and create a dictionary of nucleotide (nt) and dinucleotide (dnt) frequencies for each sequence in the file. I am then storing each dictionary in a list called "frequency". This is the piece of code that is acting strange:

for fasta in seq_file:
    freq = {}
    dna = str(fasta.seq)
    for base1 in ['A', 'T', 'G', 'C']:
        onefreq = float(dna.count(base1)) / len(dna)
        freq[base1] = onefreq
        for base2 in ['A', 'T', 'G', 'C']:
            dinucleotide = base1 + base2
            twofreq = float(dna.count(dinucleotide)) / (len(dna) - 1) 
            freq[dinucleotide] = twofreq
    frequency.append(freq)

(I am using biopython, by the way, so that I do not have to commit the entire fasta file to memory. Also this is for ssDNA, so I don't have to account for anti-sense dnt)

The frequencies that are recorded for the single nt add to 1.0, but the frequencies for the dnt do not add to 1.0. Which is od since the method of calculating the two types of frequencies are identical in my eyes.

I left the diagnostic print-statements and the "check" variables in:

for fasta in seq_file:
    freq = {}
    dna = str(fasta.seq)
    check = 0.0
    check2 = 0.0
    for base1 in ['A', 'T', 'G', 'C']:
        onefreq = float(dna.count(base1)) / len(dna)
        freq[base1] = onefreq
        check2 += onefreq
        for base2 in ['A', 'T', 'G', 'C']:
            dinucleotide = base1 + base2
            twofreq = float(dna.count(dinucleotide)) / (len(dna) - 1) 
            check += twofreq
            print(twofreq)
            freq[dinucleotide] = twofreq
    print("
")
    print(check, check2)
    print(len(dna))
    print("
")
    frequency.append(freq)

to get this output: (for only one sequence)

0.0894168466523 
0.0760259179266
0.0946004319654
0.0561555075594
0.0431965442765
0.0423326133909
0.0747300215983
0.0488120950324
0.0976241900648
0.0483801295896
0.0539956803456
0.0423326133909
0.0863930885529
0.0419006479482
0.0190064794816
0.031101511879


(0.9460043196544274, 1.0)
2316

Here we can see the frequency of each of the 16 different dnt possible, the sum of all dnt frequencies (0.946) and the sum of all nt frequencies (1.0) and the length of the sequence.

Why does the dnt frequency not add up to 1.0?

Thanks for your help. I'm very new to python, and this is my first question, so I hope that this submissions is acceptable.

Jose Ricardo Bustos M. · Accepted Answer

your problem, try with following fasta:

>test
AAAAAA

"AAAAAA".count("AA")

you get:

It should be

reason

from documentation: count return the number of (non-overlapping) occurrences of substring sub in string s[start:end]

solution using Counter and chunk function

from Bio import SeqIO

def chunks(l, n):
  for i in xrange(0, len(l)-(n-1)):
    yield l[i:i+n]

from collections import Counter

frequency = []
input_file = "test.fasta"
for fasta in SeqIO.parse(open(input_file), "fasta"):
  dna = str(fasta.seq)
  freq = Counter(dna)   #get counter of single bases
  freq.update(Counter(chunks(dna,2))) #update with counter of dinucleotide
  frequency.append(freq)

for "AAAAAA" you get:

Counter({'A': 6, 'AA': 5})

Data_addict · Answer

str.count() doesnt count overlapping motif it find.

Exemple:

If you have 'AAAA' in your sequence and you look for the dinucleotide 'AA', you expect than 'AAAA'.count('AA') return you 3, but it will return 2. So:

print float('AAAA'.count('AA')) / (len('AAAA') - 1)
0.666666

instead of 1

You can just change the line where you count the frequency by:

twofreq = len([i for i in range(len(dna)-1) if dna[i:i+2] == dinucleotide]) / float((len(dna) - 1))

Frequencies not adding up to one

Tags:

python

python-2.7

biopython

Bantha

Video Answer

2 Answers

Jose Ricardo Bustos M.

Data_addict

Recent Activity

Donate For Us

Frequencies not adding up to one

Tags:

python

python-2.7

biopython

Bantha

Video Answer

2 Answers

Jose Ricardo Bustos M.

Data_addict

Related questions

Recent Activity

Donate For Us