Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Increase the counter as a dictionary value within a loop

Tags:

python

loops

I have a list of a few hundred amino acid sequences called aa_seq, it looks like this: ['AFYIVHPMFSELINFQNEGHECQCQCG', 'KVHSLPGMSDNGSPAVLPKTEFNKYKI', 'RAQVEDLMSLSPHVENASIPKGSTPIP', 'TSTNNYPMVQEQAILSCIEQTMVADAK',...]. Each sequence is 27 letters long. I have to determine the most used amino acid for every position (1-27) and at what frequency that is.

So far I have:

   count_dict = {} 
   counter = count_dict.values()
   aa_list = ['A', 'C', 'D', 'E' ,'F' ,'G' ,'H' ,'I' ,'K' ,'L' ,    #one-letter code for amino acids
       'M' ,'N' ,'P' ,'Q' ,'R' ,'S' ,'T' ,'V' ,'W' ,'Y']
   for p in range(0,26):                       #first round:looks at the first position in each sequence
        for s in range(0,len(aa_seq)):          #goes through all sequences of the list 
             for item in aa_list:                #and checks for the occurrence of each amino acid letter (=item)
                  if item in aa_seq[s][p]:
                      count_dict[item]            #if that letter occurs at the respective position, make it a key in the dictionary
                      counter += 1                #and increase its counter (the value, as definded above) by one 
    print count_dict

It says KeyError: 'A', and it's pointing to the line count_dict[item]. So the item of the aa_list apparently can't be added as a key this way..? How do I do that? And it also gave an error "'int' object is not iterable" concerning the counter. How else can the counter be increased?

like image 546
ccaarroo Avatar asked Dec 18 '25 21:12

ccaarroo


2 Answers

You could use the Counter class

>>> from collections import Counter

>>> l = ['AFYIVHPMFSELINFQNEGHECQCQCG', 'KVHSLPGMSDNGSPAVLPKTEFNKYKI', 'RAQVEDLMSLSPHVENASIPKGSTPIP', 'TSTNNYPMVQEQAILSCIEQTMVADAK']
>>> s = [Counter([l[j][i] for j in range(len(l))]).most_common()[0] for i in range(27)]
>>> s
[('A', 1),
 ('A', 1),
 ('Y', 1),
 ('I', 1),
 ('N', 1),
 ('Y', 1),
 ('P', 2),
 ('M', 4),
 ('S', 2),
 ('Q', 1),
 ('E', 2),
 ('Q', 1),
 ('I', 1),
 ('I', 1),
 ('A', 1),
 ('Q', 1),
 ('A', 1),
 ('I', 1),
 ('I', 1),
 ('Q', 1),
 ('E', 2),
 ('C', 1),
 ('Q', 1),
 ('A', 1),
 ('Q', 1),
 ('I', 1),
 ('I', 1)]

However i might be way to inefficient if you have large data sets.

like image 140
greole Avatar answered Dec 20 '25 10:12

greole


Modified code

Here's a modified, working version of your code. It's not efficient but it should output the correct result.

A few notes :

  • You need one counter for each index. So you should initialize your dict inside the first loop.
  • range(0,26) only has 26 elements : from 0 to 25 (inclusive).
  • defaultdict helps you define 0 for each start value.
  • you need to increment the counter with count_dict[item] += 1
  • At the end of each loop, you need to find the key (amino acid) with the highest value (occurences).

from collections import defaultdict

aa_seq = ['AFYIVHPMFSELINFQNEGHECQCQCG', 'KVHSLPGMSDNGSPAVLPKTEFNKYKI',
          'RAQVEDLMSLSPHVENASIPKGSTPIP', 'TSTNNYPMVQEQAILSCIEQTMVADAK']
aa_list = ['A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L',  # one-letter code for amino acids
           'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'Y']

for p in range(27):                  # first round:looks at the first position in each sequence
    count_dict = defaultdict(int)    # initialize counter with 0 as default value
    for s in range(0, len(aa_seq)):  # goes through all sequences of the list
        # and checks for the occurrence of each amino acid letter (=item)
        for item in aa_list:
            if item in aa_seq[s][p]:
                # if that letter occurs at the respective position, make it a
                # key in the dictionary
                count_dict[item] += 1
    print(max(count_dict.items(), key=lambda x: x[1]))

It outputs :

('R', 1)
('S', 1)
('Y', 1)
('S', 1)
('E', 1)
('P', 1)
('P', 2)
('M', 4)
...

Alternative with Counter

You don't need that many loops, you just need to iterate once over every character of every sequence.

Also, there's no need to reinvent the wheel: Counter and most_common are better alternatives than defaultdict and max.

from collections import Counter

aa_seqs = ['AFYIVHPMFSELINFQNEGHECQCQCG', 'KVHSLPGMSDNGSPAVLPKTEFNKYKI', 'RAQVEDLMSLSPHVENASIPKGSTPIP', 'TSTNNYPMVQEQAILSCIEQTMVADAK']

counters = [Counter() for i in range(27)]

for aa_seq in aa_seqs:
    for (i, aa) in enumerate(aa_seq):
        counters[i][aa] += 1

most_commons = [counter.most_common()[0] for counter in counters]
print(most_commons)

It outputs :

[('K', 1), ('A', 1), ('Y', 1), ('N', 1), ('N', 1), ('Y', 1), ('P', 2), ('M', 4), ('S', 2), ('Q', 1), ('E', 2), ('G', 1), ('H', 1), ('N', 1), ('L', 1), ('N', 1), ('N', 1), ('I', 1), ('G', 1), ('H', 1), ('E', 2), ('G', 1), ('N', 1), ('K', 1), ('Y', 1), ('K', 1), ('G', 1)]

like image 37
Eric Duminil Avatar answered Dec 20 '25 10:12

Eric Duminil



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!