Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Permutations with Order

I am trying to write a Python function that performs a function similar to itertools.permutation.

import itertools
for s in itertools.permutations("TCGA****")
    print s

The ideal output from such a function would be

('*','*','*','*','T', 'C','G','A')
('*','*','*','T','*', 'C','G','A')
('*','*','*','T','C', '*','G','A')
('*','*','*','T','C', 'G','*','A')
('*','*','*','T','C', 'G','A','*')
('*','*','T','C','G', 'A','*','*')
('*','*','T','C','G', '*','*','A')
('*','*','T','C','*', '*','G','A')
...
('T', 'C','G','A','*','*','*','*')

The only difference between itertools.permutation and this function is that the order is maintained i.e. 'T' always precedes 'C' which precedes 'G' which precedes 'A'.

The following is an example that violates this rule

('*','*','T','*','G','C','A','*','*')

The order of 'C' and 'G' has changed.

How can I produce the permutations for which the order 'TCGA' is maintained among the asterisks?

like image 798
Rajan Avatar asked Feb 06 '23 16:02

Rajan


1 Answers

One idea would be to produce all the possible indices for your '*' values with itertools.combinations on your list index range, and then construct each possible permutation from those indices, filling with your 'TCGA' values accordingly for the indices not found in each combination.

Since you are assured to use all of TCGA in each iteration, itertools.cycle is one way to continually get the appropriate value for the next position. Here perms is implemented as a generator to allow for lazy evaluation.

from itertools import combinations, cycle

char_cyc = cycle('TCGA')
combos = combinations(range(8), 4)

perms = (['*' if i in combo else next(char_cyc) for i in range(8)]
         for combo in combos)

print(list(perms))

Outputs:

[['*', '*', '*', '*', 'T', 'C', 'G', 'A'], ['*', '*', '*', 'T', '*', 'C', 'G', 'A'], ['*', '*', '*', 'T', 'C', '*', 'G', 'A'], ['*', '*', '*', 'T', 'C', 'G', '*', 'A'], ['*', '*', '*', 'T', 'C', 'G', 'A', '*'], ['*', '*', 'T', '*', '*', 'C', 'G', 'A'], ['*', '*', 'T', '*', 'C', '*', 'G', 'A'], ['*', '*', 'T', '*', 'C', 'G', '*', 'A'], ['*', '*', 'T', '*', 'C', 'G', 'A', '*'], ['*', '*', 'T', 'C', '*', '*', 'G', 'A'], ['*', '*', 'T', 'C', '*', 'G', '*', 'A'], ['*', '*', 'T', 'C', '*', 'G', 'A', '*'], ['*', '*', 'T', 'C', 'G', '*', '*', 'A'], ['*', '*', 'T', 'C', 'G', '*', 'A', '*'], ['*', '*', 'T', 'C', 'G', 'A', '*', '*'], ['*', 'T', '*', '*', '*', 'C', 'G', 'A'], ['*', 'T', '*', '*', 'C', '*', 'G', 'A'], ['*', 'T', '*', '*', 'C', 'G', '*', 'A'], ['*', 'T', '*', '*', 'C', 'G', 'A', '*'], ['*', 'T', '*', 'C', '*', '*', 'G', 'A'], ['*', 'T', '*', 'C', '*', 'G', '*', 'A'], ['*', 'T', '*', 'C', '*', 'G', 'A', '*'], ['*', 'T', '*', 'C', 'G', '*', '*', 'A'], ['*', 'T', '*', 'C', 'G', '*', 'A', '*'], ['*', 'T', '*', 'C', 'G', 'A', '*', '*'], ['*', 'T', 'C', '*', '*', '*', 'G', 'A'], ['*', 'T', 'C', '*', '*', 'G', '*', 'A'], ['*', 'T', 'C', '*', '*', 'G', 'A', '*'], ['*', 'T', 'C', '*', 'G', '*', '*', 'A'], ['*', 'T', 'C', '*', 'G', '*', 'A', '*'], ['*', 'T', 'C', '*', 'G', 'A', '*', '*'], ['*', 'T', 'C', 'G', '*', '*', '*', 'A'], ['*', 'T', 'C', 'G', '*', '*', 'A', '*'], ['*', 'T', 'C', 'G', '*', 'A', '*', '*'], ['*', 'T', 'C', 'G', 'A', '*', '*', '*'], ['T', '*', '*', '*', '*', 'C', 'G', 'A'], ['T', '*', '*', '*', 'C', '*', 'G', 'A'], ['T', '*', '*', '*', 'C', 'G', '*', 'A'], ['T', '*', '*', '*', 'C', 'G', 'A', '*'], ['T', '*', '*', 'C', '*', '*', 'G', 'A'], ['T', '*', '*', 'C', '*', 'G', '*', 'A'], ['T', '*', '*', 'C', '*', 'G', 'A', '*'], ['T', '*', '*', 'C', 'G', '*', '*', 'A'], ['T', '*', '*', 'C', 'G', '*', 'A', '*'], ['T', '*', '*', 'C', 'G', 'A', '*', '*'], ['T', '*', 'C', '*', '*', '*', 'G', 'A'], ['T', '*', 'C', '*', '*', 'G', '*', 'A'], ['T', '*', 'C', '*', '*', 'G', 'A', '*'], ['T', '*', 'C', '*', 'G', '*', '*', 'A'], ['T', '*', 'C', '*', 'G', '*', 'A', '*'], ['T', '*', 'C', '*', 'G', 'A', '*', '*'], ['T', '*', 'C', 'G', '*', '*', '*', 'A'], ['T', '*', 'C', 'G', '*', '*', 'A', '*'], ['T', '*', 'C', 'G', '*', 'A', '*', '*'], ['T', '*', 'C', 'G', 'A', '*', '*', '*'], ['T', 'C', '*', '*', '*', '*', 'G', 'A'], ['T', 'C', '*', '*', '*', 'G', '*', 'A'], ['T', 'C', '*', '*', '*', 'G', 'A', '*'], ['T', 'C', '*', '*', 'G', '*', '*', 'A'], ['T', 'C', '*', '*', 'G', '*', 'A', '*'], ['T', 'C', '*', '*', 'G', 'A', '*', '*'], ['T', 'C', '*', 'G', '*', '*', '*', 'A'], ['T', 'C', '*', 'G', '*', '*', 'A', '*'], ['T', 'C', '*', 'G', '*', 'A', '*', '*'], ['T', 'C', '*', 'G', 'A', '*', '*', '*'], ['T', 'C', 'G', '*', '*', '*', '*', 'A'], ['T', 'C', 'G', '*', '*', '*', 'A', '*'], ['T', 'C', 'G', '*', '*', 'A', '*', '*'], ['T', 'C', 'G', '*', 'A', '*', '*', '*'], ['T', 'C', 'G', 'A', '*', '*', '*', '*']]

A good indication that is output is correct is the fact that the length of perms is 70, which is equal to 8C4 (or "8 choose 4"), which is effectively what your problem concerns.

like image 98
miradulo Avatar answered Feb 11 '23 17:02

miradulo