Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Letter frequency in python

Tags:

python

I need to make a program that prints out the frequency of letters in a text file and compares that frequency with that of another in python.

So far I am able to print the number of times a letter occurs, but the percentage frequency I get is wrong. I think it is because I need my program to count only the number of letters in the file by removing all the spaces and other characters.

def addLetter (x):
    result = ord(x) - ord(a)
    return result


#start of the main program
#prompt user for a file

while True:
    speech = raw_input("Enter file name:")

    wholeFile = open(speech, 'r+').read()
    lowlet = wholeFile.lower()
    letters= list(lowlet)
    alpha = list('abcdefghijklmnopqrstuvwxyz')
    n = len(letters)
    f = float(n)
    occurrences = {}
    d = {}


    #number of letters
    for x in alpha:
        occurrences[x] = letters.count(x)
        d[x] =(occurrences[x])/f
    for x in occurrences:
        print x, occurrences[x], d[x]

This is the output

Enter file name:dems.txt
a 993 0.0687863674148
c 350 0.0242449431976
b 174 0.0120532003325
e 1406 0.0973954003879
d 430 0.0297866444999
g 219 0.015170407315
f 212 0.0146855084511
i 754 0.0522305347742
h 594 0.0411471321696
k 81 0.00561097256858
j 12 0.000831255195345
m 273 0.0189110556941
l 442 0.0306178996952
o 885 0.0613050706567
n 810 0.0561097256858
q 9 0.000623441396509
p 215 0.0148933222499
s 672 0.0465502909393
r 637 0.0441257966196
u 305 0.021127736215
t 1175 0.0813937378775
w 334 0.0231366029371
v 104 0.00720421169299
y 212 0.0146855084511
x 13 0.000900526461624
z 6 0.000415627597672
Enter file name:

The program does print in columns, but I'm not really sure how to display that here.

the frequency for "a" should be .0878

like image 205
ArtisanSamosa Avatar asked Mar 17 '26 18:03

ArtisanSamosa


1 Answers

You could use the translator recipe to drop all characters not in alpha. Since doing so makes letters contain nothing but characters from alpha, n is now the correct denominator.

You could then use a collections.defaultdict(int) to count the occurrences of the letters:

import collections
import string

def translator(frm='', to='', delete='', keep=None):
    # Python Cookbook Recipe 1.9
    # Chris Perkins, Raymond Hettinger
    if len(to) == 1: to = to * len(frm)
    trans = string.maketrans(frm, to)
    if keep is not None:
        allchars = string.maketrans('', '')
        # delete is expanded to delete everything except
        # what is mentioned in set(keep)-set(delete)
        delete = allchars.translate(allchars, keep.translate(allchars, delete))
    def translate(s):
        return s.translate(trans, delete)
    return translate

alpha = 'abcdefghijklmnopqrstuvwxyz'
keep_alpha=translator(keep=alpha)

while True:
    speech = raw_input("Enter file name:")
    wholeFile = open(speech, 'r+').read()
    lowlet = wholeFile.lower()
    letters = keep_alpha(lowlet)
    n = len(letters)
    occurrences = collections.defaultdict(int)    
    for x in letters:
        occurrences[x]+=1
    for x in occurrences:
        print x, occurrences[x], occurrences[x]/float(n)
like image 88
unutbu Avatar answered Mar 20 '26 07:03

unutbu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!