Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create txt frequency counter with all letters (a-z) in python 3

I have a text file named textf that looks something like the following:

rxgmgcwbd c qcyurr bkxgmq, lwrg grru rrwxtam rwgzwt am quyam cv avrrgdwkxgcr.iwxbdamcz xdalguj qarc ram av vcmfwgmgum. yw'g

I want to do a frequency count for each letter in the text file but I want it with the condition that if a letter does not appear in the text, it should have a key:value pair with value 0. For example if z was not in the text it should look something like 'z': 0 and so on for all letters (a to z). I did the following code:

import string  
from collections import Counter 
with open("textf.txt") as tf: 
    letter = tf.read()
letter_count = Counter(letter.translate(str.maketrans('','',string.punctuation)))
print("Frequency count of letter:","\n",letter_count)

But the output looks something like this:

Counter({' ': 110, 'r': 12, 'c': 88, 'a': 55, 'g': 57, 'w': 76, 'm': 76, 'x': 72, 'u': 70, 'q': 41, 'y': 40, 'j': 36, 'l': 32, 'b': 18, 'd': 28, 'v': 27, 'k': 22, 't': 19, 'f': 18, 'z': 16, 'i': 7})

I am trying to make it so that the space count ' ': 110 is not shown and that I have all the letters(a-z) and when the letter does not appear in the text that my result prints something like 'n': 0 and so on. Any ideas or suggestions of how I could make this possible?

like image 249
adda.fuentes Avatar asked Oct 03 '17 15:10

adda.fuentes


People also ask

How do you count the number of letter spaces for a string in Python?

First we find all the digits in string with the help of re. findall() which give list of matched pattern with the help of len we calculate the length of list and similarly we find the total letters in string with the help of re. findall() method and calculate the length of list using len.

How do you count the number of times a letter appears in Python?

str.count(a) is the best solution to count a single character in a string. But if you need to count more characters you would have to read the whole string as many times as characters you want to count.

How to do frequency analysis in Python?

Frequency Analysis in Python 1 The Code. The first text file is the whole of the Sherlock Holmes novel The Sign of Four and will be used to create a list of letter frequencies which ... 2 create_decryption_dictionary. ... 3 decrypt_file. ... 4 _count_letter_frequencies. ... 5 _readfile. ...

How to find the frequency of each character in a string?

Python | Frequency of each character in String. Given a string, the task is to find the frequencies of all the characters in that string and return a dictionary with key as the character and its value as its frequency in the given string. Simply iterate through the string and form a key in dictionary of newly occurred element or if element is ...

Why do we include all letters of the alphabet in Python?

It might make sense to include all letters of the alphabet. For example, if you're interested in calculating the cosine difference between word distributions you typically require all letters. giving... Show activity on this post. Initialize an empty dictionary and iterate over every character of the word.

Are frequencies always in the same order in text files?

However, given texts of reasonable length it is probable that the majority of frequencies will be in the same order, and those that are not will often be just one or two places out enabling us to manually edit our mappings to get a better result. This project consists of two Python files, and also a couple of text files:


2 Answers

One way to do this is to make a normal dict from your Counter, using the lowercase letters as the keys of the new dict. We use the dict.get method to supply a default value of zero for missing letters.

import string  
from collections import Counter 

letter = "rxgmgcwbd c qcyurr bkxgmq, lwrg grru rrwxtam rwgzwt am quyam cv avrrgdwkxgcr.iwxbdamcz xdalguj qarc ram av vcmfwgmgum. yw'g"

letter_count = Counter(letter.translate(str.maketrans('','',string.punctuation)))
letter_count = {k: letter_count.get(k, 0) for k in string.ascii_lowercase}
print("Frequency count of letter:\n", letter_count)

output

Frequency count of letter:
 {'a': 9, 'b': 3, 'c': 8, 'd': 4, 'e': 0, 'f': 1, 'g': 12, 'h': 0, 'i': 1, 'j': 1, 'k': 2, 'l': 2, 'm': 10, 'n': 0, 'o': 0, 'p': 0, 'q': 4, 'r': 14, 's': 0, 't': 2, 'u': 5, 'v': 4, 'w': 9, 'x': 6, 'y': 3, 'z': 2}

If you do this in Python 3.6+ you get the side-benefit that the new dict is alphabetically sorted (although that behaviour is currently just an implementation detail that should not be relied upon).


As user2357112 mentions in the comments, we don't need to use letter_count.get(k, 0), since a Counter automatically returns zero if we try to read the value of a non-existent key. So that dict comprehension can be changed to

letter_count = {k: letter_count[k] for k in string.ascii_lowercase}
like image 88
PM 2Ring Avatar answered Sep 21 '22 05:09

PM 2Ring


You can do this like so:

x = "rxgmgcwbd c qcyurr bkxgmq, lwrg grru rrwxtam rwgzwt am quyam cv avrrgdwkxgcr.iwxbdamcz xdalguj qarc ram av vcmfwgmgum. yw'g"

import string

freq = {i:0 for i in string.ascii_lowercase}
for i in x:
    if i in freq:
        freq[i] += 1

You can also replace the for-loop with a dictionary-comprehension (though it's inefficient for what we are trying to do since it uses count - but added as a way just for reference):

freq = {i:x.count(i) for i in freq}

This will give as a result:

{'a': 9, 'c': 8, 'b': 3, 'e': 0, 'd': 4, 'g': 12, 'f': 1, 'i': 1, 'h': 0, 'k': 2, 'j': 1, 'm': 10, 'l': 2, 'o': 0, 'n': 0, 'q': 4, 'p': 0, 's': 0, 'r': 14, 'u': 5, 't': 2, 'w': 9, 'v': 4, 'y': 3, 'x': 6, 'z': 2}
like image 35
coder Avatar answered Sep 19 '22 05:09

coder