Right now I am able to count the frequency of each word in a list.
>>> list =['a', 'b', 'a', 'c', 'a', 'c']
frequency = {}
for w in words:
frequency[w] = frequency.get(w, 0) + 1
return frequency
It gives me this output:
{'a': 3, 'b': 1, 'c: 2'}
But what I'd like for it to give me is the frequency of pairs for each list item. For example, 'b' comes after 'a' 1 time and 'c' comes after 'a' 2 times.
{'a':{'b':1,'c':2},'b':{'a':1},'c':{'a':1}}
How would I go about accomplishing this?
We can use the counter() method from the collections module to count the frequency of elements in a list. The counter() method takes an iterable object as an input argument. It returns a Counter object which stores the frequency of all the elements in the form of key-value pairs.
Iterate over the new list and use count function (i.e. string. count(newstring[iteration])) to find the frequency of word at each iteration.
The frequency of an element in a list is defined as the number of times it exists in a list. We can count the frequency of elements in a list using a python dictionary. To perform this operation, we will create a dictionary that will contain unique elements from the input list as keys and their count as values.
>>> list =['a', 'b', 'a', 'c', 'a', 'c'] frequency = {} for w in words: frequency[w] = frequency.get(w, 0) + 1 return frequency It gives me this output: {'a': 3, 'b': 1, 'c: 2'}
Calculate frequency of occurrence of combination of bases in the pairs. Here, the combinations are 'CA' and 'GT' (Notice, order of the base matters. It is not 'CA','AC','GT' and 'TG'. Just only 'CA' and 'GT'). Eg in CA pairs, float (a) = (freq of CA pairs) - ( (freq of C in CGG) * (freq of A in ATT))
Get the frequency of elements using Counter from collections module. Convert the result to dictionary using dict and print the frequency. Let's see the code. If you run the above code, then you will get the following result. If you have any queries in the article, mention them in the comment section.
If you're willing to accept a slightly different format, it's easy to get the pairwise counts using collections.Counter
and zip
:
>>> seq = list("abacac")
>>> from collections import Counter
>>> c = Counter(zip(seq, seq[1:]))
>>> c
Counter({('a', 'c'): 2, ('b', 'a'): 1, ('c', 'a'): 1, ('a', 'b'): 1})
If you really want the format you gave, you have a few options, but one way would be to use itertools.groupby
to collect all the pairs starting with the same element together:
>>> from itertools import groupby
>>> grouped = groupby(sorted(zip(seq, seq[1:])), lambda x: x[0])
>>> {k: dict(Counter(x[1] for x in g)) for k,g in grouped}
{'a': {'c': 2, 'b': 1}, 'c': {'a': 1}, 'b': {'a': 1}}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With