I need to make a program that prints out the frequency of letters in a text file and compares that frequency with that of another in python.
So far I am able to print the number of times a letter occurs, but the percentage frequency I get is wrong. I think it is because I need my program to count only the number of letters in the file by removing all the spaces and other characters.
def addLetter (x):
result = ord(x) - ord(a)
return result
#start of the main program
#prompt user for a file
while True:
speech = raw_input("Enter file name:")
wholeFile = open(speech, 'r+').read()
lowlet = wholeFile.lower()
letters= list(lowlet)
alpha = list('abcdefghijklmnopqrstuvwxyz')
n = len(letters)
f = float(n)
occurrences = {}
d = {}
#number of letters
for x in alpha:
occurrences[x] = letters.count(x)
d[x] =(occurrences[x])/f
for x in occurrences:
print x, occurrences[x], d[x]
This is the output
Enter file name:dems.txt
a 993 0.0687863674148
c 350 0.0242449431976
b 174 0.0120532003325
e 1406 0.0973954003879
d 430 0.0297866444999
g 219 0.015170407315
f 212 0.0146855084511
i 754 0.0522305347742
h 594 0.0411471321696
k 81 0.00561097256858
j 12 0.000831255195345
m 273 0.0189110556941
l 442 0.0306178996952
o 885 0.0613050706567
n 810 0.0561097256858
q 9 0.000623441396509
p 215 0.0148933222499
s 672 0.0465502909393
r 637 0.0441257966196
u 305 0.021127736215
t 1175 0.0813937378775
w 334 0.0231366029371
v 104 0.00720421169299
y 212 0.0146855084511
x 13 0.000900526461624
z 6 0.000415627597672
Enter file name:
The program does print in columns, but I'm not really sure how to display that here.
the frequency for "a" should be .0878
You could use the translator recipe to drop all characters not in alpha.
Since doing so makes letters contain nothing but characters from alpha, n is now the correct denominator.
You could then use a collections.defaultdict(int) to count the occurrences of the letters:
import collections
import string
def translator(frm='', to='', delete='', keep=None):
# Python Cookbook Recipe 1.9
# Chris Perkins, Raymond Hettinger
if len(to) == 1: to = to * len(frm)
trans = string.maketrans(frm, to)
if keep is not None:
allchars = string.maketrans('', '')
# delete is expanded to delete everything except
# what is mentioned in set(keep)-set(delete)
delete = allchars.translate(allchars, keep.translate(allchars, delete))
def translate(s):
return s.translate(trans, delete)
return translate
alpha = 'abcdefghijklmnopqrstuvwxyz'
keep_alpha=translator(keep=alpha)
while True:
speech = raw_input("Enter file name:")
wholeFile = open(speech, 'r+').read()
lowlet = wholeFile.lower()
letters = keep_alpha(lowlet)
n = len(letters)
occurrences = collections.defaultdict(int)
for x in letters:
occurrences[x]+=1
for x in occurrences:
print x, occurrences[x], occurrences[x]/float(n)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With