I am attempting to write this code which can act as an index of sorts to sift through text files and return the occurrences of strings and which line they were on. I'm getting closer, but I've run into an issue with my iteration and I can't figure out what to do.
def index(fileName, wordList):
infile = open(fileName,'r')
i = 0
lineNumber = 0
while True:
for line in infile:
lineNumber += 1
if wordList[i] in line.split():
print(wordList[i], lineNumber)
i += 1
lineNumber = 0
fileName = 'index.txt'
wordList = eval(input("Enter a list of words to search for: \n"))
index(fileName,wordList)
I filled my .txt file with generic terms so it looks like this:
bird
bird
dog
cat
bird
When I feed a list of strings such as:
['bird','cat']
I get the following output:
Enter a list of words to search for:
['bird','cat']
bird 1
bird 2
bird 5
So it is giving me the term and line number for the first string in the list, but it isn't continuing on to the next string. Any advice? If I could possibly optimize the output to contain the line numbers to a single print that would appreciated.
Once file is read, the current file position is changed. Once the file position reached the end of the file, reading file yield empty string.
You need to rewind the file positition using file.seek
to re-read the file.
But, instead of rewinding, I would rather do as follow (using set
and in
operator):
def index(filename, words):
with open(filename) as f:
for line_number, line in enumerate(f, 1):
word = line.strip()
if word in words:
print(word, line_number)
fileName = 'index.txt'
wordList = ['bird', 'cat'] # input().split()
words = set(wordList)
index(fileName, words)
eval
executes arbitrary expression. Instead of using eval
, how about using input().split()
?Since when you reach the end of the file any attempt to read the file will yield an empty string, your program fails. One way to get over this is to use file.readlines
and store the lines in a list:
with open('test.txt') as f:
wordInput = [input(), input()] #capture the input
lines = f.readlines()
for word in wordInput:
counter = 0
for line in lines:
counter += 1
if word in line:
print(word, counter)
However, this is a bit inefficient for large files since it'll load the whole file into the buffer in memory. As an alternative, you can loop through the lines, and then call file.seek(0)
when you're done. That way the seek is back to the beginning of the file, and you can reloop it again. It works this way:
>>> with open('test.txt') as f:
for line in f:
print(line)
f.seek(0)
for line in f:
print(line)
bird
bird
dog
cat
bird
0 #returns the current seek position
bird
bird
dog
cat
bird
Also, as @falsetru mentioned in his answer, avoid using eval(input)
since it evaluates any expression you put in there, and this cand lead to unexpected input problems. Use a something
separated values, and then do wordList = input().split(something)
.
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With