Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to cleanly loop over two files in parallel in Python

Tags:

python

I frequently write code like:

lines = open('wordprob.txt','r').readlines()
words = open('StdWord.txt','r').readlines()
i = 0
for line in lines:
    v = [eval(s) for s in line.split()]
    if v[0] > v[1]:
        print words[i].strip(),
    i += 1

Is it possible to avoid use variable i and make the program shorter?

Thanks.

like image 673
Yin Zhu Avatar asked Dec 02 '09 03:12

Yin Zhu


2 Answers

You can try to use enumerate,

http://docs.python.org/tutorial/datastructures.html#looping-techniques

lines = open('wordprob.txt','r').readlines()
words = open('StdWord.txt','r').readlines()
for i,line in enumerate(lines):
        v = [eval(s) for s in line.split()]
        if v[0] > v[1]:
                print words[i].strip()
like image 55
YOU Avatar answered Oct 15 '22 15:10

YOU


It looks like you don't care what the value of i is. You just are using it as a way to pair up the lines and the words. Therefore, I recommend you read one line at a time, and at the same time read one word. Then they will match.

Also, when you use .readlines() you read all the input at once into memory. For large inputs, this will be slow. For this simple code, one line at a time is all you need. The file object returned by open() can act as an iterator that returns one line at a time.

If you can, you should avoid the use of eval(). In a simple exercise where you know what the input data will be, it is pretty safe, but if you get data from outside sources, the use of eval() could possibly allow your computer to be attacked. See this page for more info. I will write my example code to assume that you are using eval() to turn text into a float value. float() will work on an integer string value, too: float('3') will return 3.0.

Also, it appears that the input lines can only have two values. If a line ever has extra values, your code will not detect this condition. We can change the code to explicitly unpack two values from the split line, and then if there are more than two values, Python will raise an exception. Plus, the code will be slightly nicer to read.

So here is my suggested rewrite of this example:

lines = open('wordprob.txt','rt')
words = open('StdWord.txt','rt')

for line in lines:
    word = words.next().strip()  # in Python 3: word = next(words).strip()
    a, b = [float(s) for s in line.split()]
    if a > b:
        print word,  # in Python 3: print(word + ' ', end='')

EDIT: And here is the same solution, but using izip().

import itertools
lines = open('wordprob.txt','rt')
words = open('StdWord.txt','rt')

# in Python 3, just use zip() instead of izip()
for line, word in itertools.izip(lines, words):
    word = word.strip()
    a, b = [float(s) for s in line.split()]
    if a > b:
        print word,  # in Python 3: print(word + ' ', end='')

In Python 3, the built-in zip() returns an iterator, so you can just use that and not need to import itertools.

EDIT: It is best practice to use a with statement to make sure the files are properly closed, no matter what. In recent versions of Python you can have multiple with statements, and I'll do that in my solution. Also, we can unpack a generator expression just as easily as we can unpack a list, so I've changed the line that sets a, b to use a generator expression; that should be slightly faster. And we don't need to strip word unless we are going to use it. Put the changes together to get:

from itertools import izip

with open('wordprob.txt','rt') as lines, open('StdWord.txt','rt') as words:
    # in Python 3, just use zip() instead of izip()
    for line, word in izip(lines, words):
        a, b = (float(s) for s in line.split())
        if a > b:
            print word.strip(),  # in Python 3: print(word.strip() + ' ', end='')
like image 35
steveha Avatar answered Oct 15 '22 17:10

steveha