def main():
try:
file=input('Enter the name of the file you wish to open: ')
thefile=open(file,'r')
line=thefile.readline()
line=line.replace('.','')
line=line.replace(',','')
thefilelist=line.split()
thefilelistset=set(thefilelist)
d={}
for item in thefilelist:
thefile.seek(0)
wordcount=line.count(' '+item+' ')
d[item]=wordcount
for i in d.items():
print(i)
thefile.close()
except IOError:
print('IOError: Sorry but i had an issue opening the file that you specified to READ from please try again but keep in mind to check your spelling of the file you want to open')
main()
Basically I am trying to read the file and count the number of times each word in the file appears then print that word with the number of times it appeared next to it.
It all works except that it will not count the first word in the file.
my practice file that I am testing this code on contains this text:
This file is for testing. It is going to test how many times the words in here appear.
('for', 1)
('going', 1)
('the', 1)
('testing', 1)
('is', 2)
('file', 1)
('test', 1)
('It', 1)
('This', 0)
('appear', 1)
('to', 1)
('times', 1)
('here', 1)
('how', 1)
('in', 1)
('words', 1)
('many', 1)
If you notice it says that 'This' appears 0 times but it does in fact appear in the file.
any ideas?
My guess would be this line:
wordcount=line.count(' '+item+' ')
You are looking for "space" + YourWord + "space", and the first word is not preceded by space.
I would suggest more use of Python utilities. A big flaw is that you only read one line from the file.
Then you create a set of unique words and then start counting them individually which is highly inefficient; the line is traversed many times: once to create the set and then for each unique word.
Python has a built-in "high performance counter" (https://docs.python.org/2/library/collections.html#collections.Counter) which is specifically meant for use cases like this.
The following few lines replace your program; it also uses "re.split()" to split each line by word boundaries (https://docs.python.org/2/library/re.html#regular-expression-syntax).
The idea is to execute this split()
function on each of the lines of the file and update the wordcounter
with the results from this split. Also re.sub()
is used to replace the dots and commas in one go before handing the line to the split function.
import re, collections
with open(raw_input('Enter the name of the file you wish to open: '), 'r') as file:
for d in reduce(lambda acc, line: acc.update(re.split("\W", line)) or acc,
map(lambda line: re.sub("(\.,)", "", line), file),
collections.Counter()).items():
print d
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With