I have generator function that creates a Cartesian product of lists. The real application uses more complex objects, but they can be represented by strings:
import itertools
s1 = ['a', 'b']
s2 = ['c', 'd', 'e', 'f']
s3 = ['c', 'd', 'e', 'f']
s4 = ['g']
p = itertools.product(*[s1,s2,s3,s4])
names = [''.join(s) for s in p]
In this example, the result is 32 combinations of characters:
names
['accg', 'acdg', 'aceg', 'acfg', 'adcg', 'addg', 'adeg', 'adfg', 'aecg',
'aedg', 'aeeg', 'aefg', 'afcg', 'afdg', 'afeg', 'affg', 'bccg', 'bcdg',
'bceg', 'bcfg', 'bdcg', 'bddg', 'bdeg', 'bdfg', 'becg', 'bedg', 'beeg',
'befg', 'bfcg', 'bfdg', 'bfeg', 'bffg']
Now, let's say I have some constraints such that certain character combinations are illegal. For example, let's say that only strings that contain the regex '[ab].c' are allowed. ('a' or 'b' followed by any letter followed by 'c')
After applying these constraints, we are left with a reduced set of just 8 strings:
import re
r = re.compile('[ab].c')
filter(r.match, names)
['accg', 'adcg', 'aecg', 'afcg', 'bccg', 'bdcg', 'becg', 'bfcg']
In the real application the chains are longer, there could be thousands of combinations and applying the hundreds of constraints is fairly computationally intensive so I'm concerned about scalability.
Right now I'm going through every single combination and checking its validity. Does an algorithm/data structure exist that could speed up this process?
EDIT: Maybe this will clarify a little: In the real application I am assembling random 2D pictures of buildings from simple basic blocks (like pillars, roof segments, windows, etc.). The constraints limit what kind of blocks (and their rotations) can be grouped together, so the resulting random image looks realistic, and not a random jumble.
A given constraint can contain many combinations of patterns. But of all those combinations, many are not valid because a different constraint would prohibit some portion of it. So in my example, one constraint would contain the full Cartesian product of characters above. And a second constraint is the '[ab].c'; this second constraint reduces the number of valid combinations of the first constraint that I need to consider.
Because these constraints are difficult to create; I looking to visualize what all the combinations of blocks in each constraint look like, but only the valid combinations that pass all constraints. Hence my question. Thanks!
Try just providing the iterator that generates the names directly to filter, like so:
import itertools
import re
s1 = ['a', 'b']
s2 = ['c', 'd', 'e', 'f']
s3 = ['c', 'd', 'e', 'f']
s4 = ['g']
p = itertools.product(*[s1,s2,s3,s4])
r = re.compile('[ab].c')
l = filter(r.search, (''.join(s) for s in p))
print(list(l))
That way, it shouldn't assemble the full set of combinations in memory, it will only keep the ones that fit the criteria. There is probably another way as well.
EDIT:
One of the primary differences from the original, is that instead of:
[''.join(s) for s in p]
Which is a list comprehension, we use:
(''.join(s) for s in p)
Which is a generator.
The important difference here is that a list comprehension creates a list using the designated criteria and generator, while only providing the generator allows the filter to generate values as needed. The important mechanism is lazy evaluation, which really just boils down to only evaluating expressions as their values become necessary.
By switching from a list to a generator, Rob's answer saves space but not time (at least, not asymptotically). You've asked a very broad question about how to enumerate all solutions to what is essentially a constraint satisfaction problem. The big wins are going to come from enforcing local consistency, but it's difficult to give you advice without specific knowledge of your constraints.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With