I am trying to write a Python function that performs a function similar to itertools.permutation
.
import itertools
for s in itertools.permutations("TCGA****")
print s
The ideal output from such a function would be
('*','*','*','*','T', 'C','G','A')
('*','*','*','T','*', 'C','G','A')
('*','*','*','T','C', '*','G','A')
('*','*','*','T','C', 'G','*','A')
('*','*','*','T','C', 'G','A','*')
('*','*','T','C','G', 'A','*','*')
('*','*','T','C','G', '*','*','A')
('*','*','T','C','*', '*','G','A')
...
('T', 'C','G','A','*','*','*','*')
The only difference between itertools.permutation
and this function is that the order is maintained i.e. 'T' always precedes 'C' which precedes 'G' which precedes 'A'.
The following is an example that violates this rule
('*','*','T','*','G','C','A','*','*')
The order of 'C' and 'G' has changed.
How can I produce the permutations for which the order 'TCGA'
is maintained among the asterisks?
One idea would be to produce all the possible indices for your '*'
values with itertools.combinations
on your list index range, and then construct each possible permutation from those indices, filling with your 'TCGA'
values accordingly for the indices not found in each combination.
Since you are assured to use all of TCGA
in each iteration, itertools.cycle
is one way to continually get the appropriate value for the next position. Here perms
is implemented as a generator to allow for lazy evaluation.
from itertools import combinations, cycle
char_cyc = cycle('TCGA')
combos = combinations(range(8), 4)
perms = (['*' if i in combo else next(char_cyc) for i in range(8)]
for combo in combos)
print(list(perms))
Outputs:
[['*', '*', '*', '*', 'T', 'C', 'G', 'A'], ['*', '*', '*', 'T', '*', 'C', 'G', 'A'], ['*', '*', '*', 'T', 'C', '*', 'G', 'A'], ['*', '*', '*', 'T', 'C', 'G', '*', 'A'], ['*', '*', '*', 'T', 'C', 'G', 'A', '*'], ['*', '*', 'T', '*', '*', 'C', 'G', 'A'], ['*', '*', 'T', '*', 'C', '*', 'G', 'A'], ['*', '*', 'T', '*', 'C', 'G', '*', 'A'], ['*', '*', 'T', '*', 'C', 'G', 'A', '*'], ['*', '*', 'T', 'C', '*', '*', 'G', 'A'], ['*', '*', 'T', 'C', '*', 'G', '*', 'A'], ['*', '*', 'T', 'C', '*', 'G', 'A', '*'], ['*', '*', 'T', 'C', 'G', '*', '*', 'A'], ['*', '*', 'T', 'C', 'G', '*', 'A', '*'], ['*', '*', 'T', 'C', 'G', 'A', '*', '*'], ['*', 'T', '*', '*', '*', 'C', 'G', 'A'], ['*', 'T', '*', '*', 'C', '*', 'G', 'A'], ['*', 'T', '*', '*', 'C', 'G', '*', 'A'], ['*', 'T', '*', '*', 'C', 'G', 'A', '*'], ['*', 'T', '*', 'C', '*', '*', 'G', 'A'], ['*', 'T', '*', 'C', '*', 'G', '*', 'A'], ['*', 'T', '*', 'C', '*', 'G', 'A', '*'], ['*', 'T', '*', 'C', 'G', '*', '*', 'A'], ['*', 'T', '*', 'C', 'G', '*', 'A', '*'], ['*', 'T', '*', 'C', 'G', 'A', '*', '*'], ['*', 'T', 'C', '*', '*', '*', 'G', 'A'], ['*', 'T', 'C', '*', '*', 'G', '*', 'A'], ['*', 'T', 'C', '*', '*', 'G', 'A', '*'], ['*', 'T', 'C', '*', 'G', '*', '*', 'A'], ['*', 'T', 'C', '*', 'G', '*', 'A', '*'], ['*', 'T', 'C', '*', 'G', 'A', '*', '*'], ['*', 'T', 'C', 'G', '*', '*', '*', 'A'], ['*', 'T', 'C', 'G', '*', '*', 'A', '*'], ['*', 'T', 'C', 'G', '*', 'A', '*', '*'], ['*', 'T', 'C', 'G', 'A', '*', '*', '*'], ['T', '*', '*', '*', '*', 'C', 'G', 'A'], ['T', '*', '*', '*', 'C', '*', 'G', 'A'], ['T', '*', '*', '*', 'C', 'G', '*', 'A'], ['T', '*', '*', '*', 'C', 'G', 'A', '*'], ['T', '*', '*', 'C', '*', '*', 'G', 'A'], ['T', '*', '*', 'C', '*', 'G', '*', 'A'], ['T', '*', '*', 'C', '*', 'G', 'A', '*'], ['T', '*', '*', 'C', 'G', '*', '*', 'A'], ['T', '*', '*', 'C', 'G', '*', 'A', '*'], ['T', '*', '*', 'C', 'G', 'A', '*', '*'], ['T', '*', 'C', '*', '*', '*', 'G', 'A'], ['T', '*', 'C', '*', '*', 'G', '*', 'A'], ['T', '*', 'C', '*', '*', 'G', 'A', '*'], ['T', '*', 'C', '*', 'G', '*', '*', 'A'], ['T', '*', 'C', '*', 'G', '*', 'A', '*'], ['T', '*', 'C', '*', 'G', 'A', '*', '*'], ['T', '*', 'C', 'G', '*', '*', '*', 'A'], ['T', '*', 'C', 'G', '*', '*', 'A', '*'], ['T', '*', 'C', 'G', '*', 'A', '*', '*'], ['T', '*', 'C', 'G', 'A', '*', '*', '*'], ['T', 'C', '*', '*', '*', '*', 'G', 'A'], ['T', 'C', '*', '*', '*', 'G', '*', 'A'], ['T', 'C', '*', '*', '*', 'G', 'A', '*'], ['T', 'C', '*', '*', 'G', '*', '*', 'A'], ['T', 'C', '*', '*', 'G', '*', 'A', '*'], ['T', 'C', '*', '*', 'G', 'A', '*', '*'], ['T', 'C', '*', 'G', '*', '*', '*', 'A'], ['T', 'C', '*', 'G', '*', '*', 'A', '*'], ['T', 'C', '*', 'G', '*', 'A', '*', '*'], ['T', 'C', '*', 'G', 'A', '*', '*', '*'], ['T', 'C', 'G', '*', '*', '*', '*', 'A'], ['T', 'C', 'G', '*', '*', '*', 'A', '*'], ['T', 'C', 'G', '*', '*', 'A', '*', '*'], ['T', 'C', 'G', '*', 'A', '*', '*', '*'], ['T', 'C', 'G', 'A', '*', '*', '*', '*']]
A good indication that is output is correct is the fact that the length of perms
is 70, which is equal to 8C4 (or "8 choose 4"), which is effectively what your problem concerns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With