I have a text file named textf that looks something like the following:
rxgmgcwbd c qcyurr bkxgmq, lwrg grru rrwxtam rwgzwt am quyam cv avrrgdwkxgcr.iwxbdamcz xdalguj qarc ram av vcmfwgmgum. yw'g
I want to do a frequency count for each letter in the text file but I want it with the condition that if a letter does not appear in the text, it should have a key:value pair with value 0. For example if z was not in the text it should look something like 'z': 0 and so on for all letters (a to z). I did the following code:
import string
from collections import Counter
with open("textf.txt") as tf:
letter = tf.read()
letter_count = Counter(letter.translate(str.maketrans('','',string.punctuation)))
print("Frequency count of letter:","\n",letter_count)
But the output looks something like this:
Counter({' ': 110, 'r': 12, 'c': 88, 'a': 55, 'g': 57, 'w': 76, 'm': 76, 'x': 72, 'u': 70, 'q': 41, 'y': 40, 'j': 36, 'l': 32, 'b': 18, 'd': 28, 'v': 27, 'k': 22, 't': 19, 'f': 18, 'z': 16, 'i': 7})
I am trying to make it so that the space count ' ': 110
is not shown and that I have all the letters(a-z) and when the letter does not appear in the text that my result prints something like 'n': 0
and so on. Any ideas or suggestions of how I could make this possible?
First we find all the digits in string with the help of re. findall() which give list of matched pattern with the help of len we calculate the length of list and similarly we find the total letters in string with the help of re. findall() method and calculate the length of list using len.
str.count(a) is the best solution to count a single character in a string. But if you need to count more characters you would have to read the whole string as many times as characters you want to count.
Frequency Analysis in Python 1 The Code. The first text file is the whole of the Sherlock Holmes novel The Sign of Four and will be used to create a list of letter frequencies which ... 2 create_decryption_dictionary. ... 3 decrypt_file. ... 4 _count_letter_frequencies. ... 5 _readfile. ...
Python | Frequency of each character in String. Given a string, the task is to find the frequencies of all the characters in that string and return a dictionary with key as the character and its value as its frequency in the given string. Simply iterate through the string and form a key in dictionary of newly occurred element or if element is ...
It might make sense to include all letters of the alphabet. For example, if you're interested in calculating the cosine difference between word distributions you typically require all letters. giving... Show activity on this post. Initialize an empty dictionary and iterate over every character of the word.
However, given texts of reasonable length it is probable that the majority of frequencies will be in the same order, and those that are not will often be just one or two places out enabling us to manually edit our mappings to get a better result. This project consists of two Python files, and also a couple of text files:
One way to do this is to make a normal dict from your Counter, using the lowercase letters as the keys of the new dict. We use the dict.get
method to supply a default value of zero for missing letters.
import string
from collections import Counter
letter = "rxgmgcwbd c qcyurr bkxgmq, lwrg grru rrwxtam rwgzwt am quyam cv avrrgdwkxgcr.iwxbdamcz xdalguj qarc ram av vcmfwgmgum. yw'g"
letter_count = Counter(letter.translate(str.maketrans('','',string.punctuation)))
letter_count = {k: letter_count.get(k, 0) for k in string.ascii_lowercase}
print("Frequency count of letter:\n", letter_count)
output
Frequency count of letter:
{'a': 9, 'b': 3, 'c': 8, 'd': 4, 'e': 0, 'f': 1, 'g': 12, 'h': 0, 'i': 1, 'j': 1, 'k': 2, 'l': 2, 'm': 10, 'n': 0, 'o': 0, 'p': 0, 'q': 4, 'r': 14, 's': 0, 't': 2, 'u': 5, 'v': 4, 'w': 9, 'x': 6, 'y': 3, 'z': 2}
If you do this in Python 3.6+ you get the side-benefit that the new dict is alphabetically sorted (although that behaviour is currently just an implementation detail that should not be relied upon).
As user2357112 mentions in the comments, we don't need to use letter_count.get(k, 0)
, since a Counter automatically returns zero if we try to read the value of a non-existent key. So that dict comprehension can be changed to
letter_count = {k: letter_count[k] for k in string.ascii_lowercase}
You can do this like so:
x = "rxgmgcwbd c qcyurr bkxgmq, lwrg grru rrwxtam rwgzwt am quyam cv avrrgdwkxgcr.iwxbdamcz xdalguj qarc ram av vcmfwgmgum. yw'g"
import string
freq = {i:0 for i in string.ascii_lowercase}
for i in x:
if i in freq:
freq[i] += 1
You can also replace the for-loop with a dictionary-comprehension (though it's inefficient for what we are trying to do since it uses count
- but added as a way just for reference):
freq = {i:x.count(i) for i in freq}
This will give as a result:
{'a': 9, 'c': 8, 'b': 3, 'e': 0, 'd': 4, 'g': 12, 'f': 1, 'i': 1, 'h': 0, 'k': 2, 'j': 1, 'm': 10, 'l': 2, 'o': 0, 'n': 0, 'q': 4, 'p': 0, 's': 0, 'r': 14, 'u': 5, 't': 2, 'w': 9, 'v': 4, 'y': 3, 'x': 6, 'z': 2}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With