I have a requirement where I have the alphabet 'ACGT' and I need to create a string of around 20,000 characters. This string should contain 100+ occurrences of the pattern "CCGT". Most of the time the generated string contains it around 20-30 instances.
int N = 20000;
std::string alphabet("ACGT");
std::string str;
str.reserve(N);
for (int index = 0; index < N; index++)
{
str += alphabet[rand() % (alphabet.length())];
}
How do I tweak the code so that the pattern would appear more often?
Edit - Is there a way of changing the alphabet, i.e - 'A', 'C', 'G', 'T', 'CCGT' as characters of the alphabet?
Thank you.
Generate an array of ints containing 100 x 0s and 490 1s, 2s, 3s and 4s [000000....111111....2222 etc] making almost 20,000 entries.
Then random shuffle it (std::random_shuffle)
Then write a string where each 0 translates to 'CCGT', each 1 translates to 'A', each 2 .... etc
I think that gives you what you want, and by tweaking the original array of ints you could change the number of 'A' characters in the output too.
Edit: If that isn't random enough, do 100 0s at the start and then random 1-4 for the rest.
The only solution I can think of that would meet the "100+" criteria is:
create 20000 character string
number of instances (call it n) = 100 + some random value
for (i = 0 ; i < n ; ++i)
{
pick random start position
write CCGT
}
Of course, you'd need to ensure the overwritten characters weren't part of a "CCGT" already.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With