Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When counting the occurrence of a string in a file, my code does not count the very first word

Code

def main():
try:
    file=input('Enter the name of the file you wish to open: ')
    thefile=open(file,'r')
    line=thefile.readline()
    line=line.replace('.','')
    line=line.replace(',','')
    thefilelist=line.split()
    thefilelistset=set(thefilelist)
    d={}
    for item in thefilelist:
        thefile.seek(0)
        wordcount=line.count(' '+item+' ')
        d[item]=wordcount
    for i in d.items():
        print(i)   
    thefile.close()
except IOError:
    print('IOError: Sorry but i had an issue opening the file that you specified to READ from please try again but keep in mind to check your spelling of the file you want to open')
main()

Problem

Basically I am trying to read the file and count the number of times each word in the file appears then print that word with the number of times it appeared next to it.

It all works except that it will not count the first word in the file.

File I am using

my practice file that I am testing this code on contains this text:

This file is for testing. It is going to test how many times the words in here appear.

output

('for', 1)
('going', 1)
('the', 1)
('testing', 1)
('is', 2)
('file', 1)
('test', 1)
('It', 1)
('This', 0)
('appear', 1)
('to', 1)
('times', 1)
('here', 1)
('how', 1)
('in', 1)
('words', 1)
('many', 1)

note

If you notice it says that 'This' appears 0 times but it does in fact appear in the file.

any ideas?

like image 682
iWantToLearnThis Avatar asked Dec 04 '22 02:12

iWantToLearnThis


2 Answers

My guess would be this line:

wordcount=line.count(' '+item+' ')

You are looking for "space" + YourWord + "space", and the first word is not preceded by space.

like image 113
Guilherme Avatar answered Apr 07 '23 20:04

Guilherme


I would suggest more use of Python utilities. A big flaw is that you only read one line from the file.

Then you create a set of unique words and then start counting them individually which is highly inefficient; the line is traversed many times: once to create the set and then for each unique word.

Python has a built-in "high performance counter" (https://docs.python.org/2/library/collections.html#collections.Counter) which is specifically meant for use cases like this.

The following few lines replace your program; it also uses "re.split()" to split each line by word boundaries (https://docs.python.org/2/library/re.html#regular-expression-syntax).

The idea is to execute this split() function on each of the lines of the file and update the wordcounter with the results from this split. Also re.sub() is used to replace the dots and commas in one go before handing the line to the split function.

import re, collections

with open(raw_input('Enter the name of the file you wish to open: '), 'r') as file:
    for d in reduce(lambda acc, line: acc.update(re.split("\W", line)) or acc,
                     map(lambda line: re.sub("(\.,)", "", line), file),
                     collections.Counter()).items():
        print d
like image 29
haavee Avatar answered Apr 07 '23 21:04

haavee