Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iteration over lines in a text file, returning line numbers and occurrences?

Tags:

python

string

I am attempting to write this code which can act as an index of sorts to sift through text files and return the occurrences of strings and which line they were on. I'm getting closer, but I've run into an issue with my iteration and I can't figure out what to do.

def index(fileName, wordList):

    infile = open(fileName,'r')

    i = 0
    lineNumber = 0
    while True:
        for line in infile:
            lineNumber += 1
            if wordList[i] in line.split():
                print(wordList[i], lineNumber)
        i += 1
        lineNumber = 0

fileName = 'index.txt'
wordList = eval(input("Enter a list of words to search for: \n"))

index(fileName,wordList)

I filled my .txt file with generic terms so it looks like this:

bird 
bird 
dog 
cat 
bird

When I feed a list of strings such as:

['bird','cat']

I get the following output:

Enter a list of words to search for: 
['bird','cat']
bird 1
bird 2
bird 5

So it is giving me the term and line number for the first string in the list, but it isn't continuing on to the next string. Any advice? If I could possibly optimize the output to contain the line numbers to a single print that would appreciated.

like image 977
user2909869 Avatar asked Jan 11 '14 08:01

user2909869


2 Answers

Once file is read, the current file position is changed. Once the file position reached the end of the file, reading file yield empty string.

You need to rewind the file positition using file.seek to re-read the file.

But, instead of rewinding, I would rather do as follow (using set and in operator):

def index(filename, words):
    with open(filename) as f:
        for line_number, line in enumerate(f, 1):
            word = line.strip()
            if word in words:
                print(word, line_number)

fileName = 'index.txt'
wordList = ['bird', 'cat'] # input().split()
words = set(wordList)
index(fileName, words)
  • eval executes arbitrary expression. Instead of using eval, how about using input().split() ?
like image 98
falsetru Avatar answered Oct 08 '22 23:10

falsetru


Since when you reach the end of the file any attempt to read the file will yield an empty string, your program fails. One way to get over this is to use file.readlines and store the lines in a list:

with open('test.txt') as f:
    wordInput = [input(), input()] #capture the input
    lines = f.readlines()
    for word in wordInput:
        counter = 0
        for line in lines:
            counter += 1
            if word in line:
                print(word, counter)

However, this is a bit inefficient for large files since it'll load the whole file into the buffer in memory. As an alternative, you can loop through the lines, and then call file.seek(0) when you're done. That way the seek is back to the beginning of the file, and you can reloop it again. It works this way:

>>> with open('test.txt') as f:
        for line in f:
            print(line)
        f.seek(0)
        for line in f:
            print(line)


bird 

bird 

dog 

cat 

bird
0 #returns the current seek position
bird 

bird 

dog 

cat 

bird

Also, as @falsetru mentioned in his answer, avoid using eval(input) since it evaluates any expression you put in there, and this cand lead to unexpected input problems. Use a something separated values, and then do wordList = input().split(something).

Hope this helps!

like image 40
aIKid Avatar answered Oct 08 '22 22:10

aIKid