Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate (not so)random string with particular string occurences

Tags:

c++

string

random

I have a requirement where I have the alphabet 'ACGT' and I need to create a string of around 20,000 characters. This string should contain 100+ occurrences of the pattern "CCGT". Most of the time the generated string contains it around 20-30 instances.

    int N = 20000;
    std::string alphabet("ACGT");
    std::string str;
    str.reserve(N);
    for (int index = 0; index < N; index++)
    {
        str += alphabet[rand() % (alphabet.length())];
    }

How do I tweak the code so that the pattern would appear more often?

Edit - Is there a way of changing the alphabet, i.e - 'A', 'C', 'G', 'T', 'CCGT' as characters of the alphabet?

Thank you.

like image 653
Madz Avatar asked Dec 02 '25 06:12

Madz


2 Answers

Generate an array of ints containing 100 x 0s and 490 1s, 2s, 3s and 4s [000000....111111....2222 etc] making almost 20,000 entries.

Then random shuffle it (std::random_shuffle)

Then write a string where each 0 translates to 'CCGT', each 1 translates to 'A', each 2 .... etc

I think that gives you what you want, and by tweaking the original array of ints you could change the number of 'A' characters in the output too.

Edit: If that isn't random enough, do 100 0s at the start and then random 1-4 for the rest.

like image 151
Andy Newman Avatar answered Dec 03 '25 21:12

Andy Newman


The only solution I can think of that would meet the "100+" criteria is:

create 20000 character string
number of instances (call it n) = 100 + some random value
for (i = 0 ; i < n ; ++i)
{
   pick random start position
   write CCGT
}

Of course, you'd need to ensure the overwritten characters weren't part of a "CCGT" already.

like image 31
Skizz Avatar answered Dec 03 '25 21:12

Skizz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!