Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to count words in a file using Python

Tags:

python

file

I am attempting to count the number of 'difficult words' in a file, which requires me to count the number of letters in each word. For now, I am only trying to get single words, one at a time, from a file. I've written the following:

file = open('infile.txt', 'r+')
fileinput = file.read()

for line in fileinput:
    for word in line.split():
        print(word)

Output:

t
h
e

o
r
i
g
i
n

.
.
.

It seems to be printing one character at a time instead of one word at a time. I'd really like to know more about what is actually happening here. Any suggestions?

like image 869
AustinC Avatar asked Jan 08 '23 06:01

AustinC


2 Answers

Use splitlines():

fopen = open('infile.txt', 'r+')
fileinput = fopen.read()

for line in fileinput.splitlines():
    for word in line.split():
        print(word)

fopen.close()

Without splitlines():

You can also use with statement to open the file. It closes the file automagically:

with open('infile.txt', 'r+') as fopen:
    for line in fopen:
        for word in line.split():
            print(word)
like image 55
Andrés Pérez-Albela H. Avatar answered Jan 09 '23 19:01

Andrés Pérez-Albela H.


A file supports the iteration protocol, which for bigger files is much better than reading the whole content in memory in one go

with open('infile.txt', 'r+') as f:
    for line in f:
        for word in line.split():
            print(word)

Assuming you are going to define a filter function, you could do something along the line

def is_difficult(word):
    return len(word)>5

with open('infile.txt', 'r+') as f:
    words = (w for line in f for w in line.split() if is_difficult(w))
    for w in words:
        print(w)

which, with an input file of

ciao come va
oggi meglio di domani
ieri peggio di oggi

produces

meglio
domani
peggio
like image 21
Pynchia Avatar answered Jan 09 '23 20:01

Pynchia