I have been trying to pull of median string search for a sequence in the ACGT genome. The problem I have is going to say AAAAAAAA to AAAAAAAC and so forth until I have tried every possible combination.
I've been essentially going brute force at it by creating two lists, one containg A,C,G,T and the other the 8 character sequence, and after each search iterating and swapping characters. The problem is that I don't test all combinations because when two iterate at the same time it jumps a letter.
Is there any way to go AAAAAAAA - AAAAAAAC - AAAAAAAG - AAAAAAAT - AAAAAACA and so forth easily?
Using itertools
itertools.product("ACGT", repeat=8)
As above suggested use itertools,
itertools.product("ACGT", repeat=8) # will work in your case.
Using the regex inverter from the pyparsing wiki Examples page, invert this regex: [ACGT]{8}
. You can also try the online inverter at the UtilityMill, but this server will timeout when generating 8-character strings, but I have successfully gotten up to 6 characters within the allowed time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With