I have some very inefficient code I'd like to make more general/efficient. I am trying to create strings from a set of lists.
Here is what I currently have:
#contains categories
numind = [('Length',), ('Fungus',)]
#contains values that pertain to the categories
records = [('Length', 'Long'), ('Length', 'Med'), ('Fungus', 'Yes'), ('Fungus', 'No')]
#contains every combination of values between the 2 categories.
#for example, (Long, Yes) = Length=Long & Fungus = Yes.
combinations = [('Long', 'Yes'), ('Long', 'No'), ('Med', 'Yes'), ('Med', 'No')]
Now I want to create strings that have every combination in my combination list. This is the inefficient part. I'd like it so that I don't have to hardwire the length of the "numind" list. Any ideas?
values = combinations
valuestring = []
if len(numind) == 0:
pass
elif len(numind) == 1:
for a in xrange(len(values)):
valuestring.append(numind[0][0]+values[a][0])
elif len(numind) == 2:
for a in xrange(len(values)):
valuestring.append(numind[0][0]+values[a][0]+'_'+numind[1][0]+values[a][1])
#and so forth until numind is 10+
output
['LengthLong_FungusYes', 'LengthLong_FungusNo', 'LengthMed_FungusYes', 'LengthMed_FungusNo']
I'd use itertools.product
with collections.OrderedDict
(the latter isn't strictly necessary, but means that you get the order right without having to think about it):
>>> from collections import OrderedDict
>>> from itertools import product
>>>
>>> d = OrderedDict()
>>> for k, v in records:
... d.setdefault(k, []).append(v)
...
>>> d
OrderedDict([('Length', ['Long', 'Med']), ('Fungus', ['Yes', 'No'])])
>>> ['_'.join(k+v for k,v in zip(d, v)) for v in product(*d.values())]
['LengthLong_FungusYes', 'LengthLong_FungusNo', 'LengthMed_FungusYes', 'LengthMed_FungusNo']
itertools.product
naturally produces the "every combination" part (which is really called the Cartesian product, not a combination):
>>> product(["Long", "Med"], ["Yes", "No"])
<itertools.product object at 0x96b0dec>
>>> list(product(["Long", "Med"], ["Yes", "No"]))
[('Long', 'Yes'), ('Long', 'No'), ('Med', 'Yes'), ('Med', 'No')]
The advantage here is that it doesn't matter how many categories there are or how many values there are associated with any category: as long as they're specified in records
, it should work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With