I only have two sentences that I want to produce variations and compute the leveshtein distance of, but when trying to produce this list with itertools even my 64GB RAM machine gets overloaded.
Is there a way to limit this, even if I have to limit it to a certain number of combinations.
here is my code so far:
from __future__ import print_function
import itertools
import sys
in_file = sys.argv[1]
X = []
with open(in_file) as f:
lis = list(f)
X.append([' '.join(x) for x in itertools.product(*map(set, zip(*map(str.split, lis))))])
for x in X:
print x
The problem is not with itertools
: itertools works lazily: it produces iterables. The problem is that you first want to put all these elements in a list. As a result all the combinations have to exist at the same time. This obviously requires more memory than doing this in an iterative way since in the latter case, the memory of a previous combination can be reused.
If you thus want to print all combinations, without storing them, you can use:
with open(in_file) as f:
lis = list(f)
for x in itertools.product(*map(set, zip(*map(str.split, lis)))):
print(' '.join(x))
In case you want to store them, you can limit the number by using itertools.islice
:
from itertools import islice, product
X = []
with open(in_file) as f:
lis = list(f)
X += [' '.join(x) for x in islice(product(*map(set, zip(*map(str.split, lis)))),1000000)])
Here we thus limit the number of products to 1'000'000.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With