I have 2 files:
hyp.txt
It is a guide to action which ensures that the military always obeys the commands of the party
he read the book because he was interested in world history
ref.txt
It is a guide to action that ensures that the military will forever heed Party commands
he was interested in world history because he read the book
And I have a function that does some calculations to compare the lines of the text, e.g. line 1 of hyp.txt with line 1 of ref.txt.
def scorer(list_of_tokenized_hyp, list_of_tokenized_ref):
"""
:type list_of_tokenized_hyp: iter(iter(str))
:type list_of_tokenized_ref: iter(iter(str))
"""
for hypline, refline in zip(list_of_tokenized_hyp, list_of_tokenized_ref):
# do something with the iter(str)
return score
And this function cannot be changed. I can however manipulate what i feed to the function. So currently I'm feeding the file into the function like this:
with open('hyp.txt', 'r') as hypfin, open('ref.txt', 'r') as reffin:
hyp = [line.split() for line in hypfin]
ref = [line.split() for line in reffin]
scorer(hypfin, reffin)
But by doing so I have loaded the whole file and the split string into memory before feeding it into the scorer()
.
Know that the scorer()
is processing the files line by line, is there a way not to materialize the split string before feeding into the function without changing the scorer()
function?
Is there a way to feed in some sort of generator instead?
I've tried this:
with open('hyp.txt', 'r') as hypfin, open('ref1.txt', 'r') as ref1fin, open('ref2.txt', 'r') as ref2fin:
hyp = (h.split() for h in hypline)
ref = (r.split() for r in hypline)
scorer(hypfin, reffin)
but I'm not sure whether the h.split()
has been materialized. If it has been materialized, why? If not, why?
If I could change the scorer()
function, then I could easily add this line after the for
:
def scorer(list_of_tokenized_hyp, list_of_tokenized_ref):
for hypline, refline in zip(list_of_tokenized_hyp, list_of_tokenized_ref):
hypline = hypline.split()
refline = refline.split()
# do something with the iter(str)
return score
But this is not possible in my case, since I can't change that function.
Yes, your example defines two generators
with open('hyp.txt', 'r') as hypfin, open('ref1.txt', 'r') as reffin:
hyp = (h.split() for h in hypfin)
ref = (r.split() for r in reffin)
scorer(hyp, ref)
and the split
, and the corresponding reading of the next line, is done for each for-loop-iteration.
Your generator expressions combined with Python 3's zip()
(replace with itertools.izip()
in Python 2) behave as you require, i.e. they do not read the entire file to create the splitted lists in one go.
You can get some idea as to what is going on by substituting a logging version of str.split()
:
def my_split(s):
print('my_split(): {!r}'.format(s))
return s.split()
>>> hypfin = open('hyp.txt', 'r')
>>> reffin = open('ref.txt', 'r')
>>> hyp = (my_split(h) for h in hypfin) # N.B. my_split() not called here
>>> hyp
<generator object <genexpr> at 0x7fa89ad16b40>
>>> ref = (my_split(r) for r in reffin) # N.B. my_split() not called here
>>> ref
<generator object <genexpr> at 0x7fa89ad16bd0>
>>> z = zip(hyp, ref) # N.B. my_split() not called here
>>> z
<zip object at 0x7fa89ad15cc8>
>>> hypline, refline = next(z)
my_split(): 'It is a guide to action which ensures that the military always obeys the commands of the party\n'
my_split(): 'It is a guide to action that ensures that the military will forever heed Party commands\n'
>>> hypline, refline = next(z)
my_split(): 'he read the book because he was interested in world history\n'
my_split(): 'he was interested in world history because he read the book\n'
>>> hypline, refline = next(z)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
From the output of my_split()
you can see that hyp
and ref
are indeed generators that do not consume input until required. z
is a zip
object that also does not consume any input until accessed. The for
loop is simulated with next()
to demonstrate that only one line of input from each file is consumed at each iteration.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With