Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithmic / probability exercise

I'm trying to solve an exercise from the rosalind project, but keep making some mistake apparently. The full text is available here, but my shorter abstract interpretation and attempt is as follows. Please help me find what am I doing wrong:

We have 3 groups of items: AA, Aa, aa. We start with 1 in Aa and do k iterations of generating new items. In every iteration every item in group:

  • Aa could produce: AA (25%), Aa (50%), aa (25%)
  • AA could produce: AA (50%), Aa (50%)
  • aa could produce: aa (50%), Aa (50%)

As a result of iteration we count expected number of items for each group, assuming we generate 2 new items from each one in the previous iteration. So the we end up with:

  • 0th iter: AA: 0, Aa: 1, aa: 0
  • 1st iter: AA: .5, Aa: 1, aa: .5
  • 2nd iter: AA: 1, Aa: 2, aa: 1
  • etc. - proportions stay at 1:2:1 between groups

The sum of expected values / population on each iteration is 2^iteration and the probability of an item being in group Aa is always 50%.

So far I hope I'm right, but what we're actually after is: what are the chances of having at least N items that are in group Aa both times if we repeat the experiment twice. (should be equivalent to: what are the chances of having at least N items in group AaBb if we extend the list of groups to AABB, AABb, .... from the original question)

So the probability of item being in Aa is 50%, population sum of expected values from iteration (or 2^iteration), and throwing that at scipy using the test data (k=2, N=1), we get this for just at least one item in group Aa:

In [75]: bin = scipy.stats.binom(4, .5)
In [76]: sum(b.pmf(x) for x in range(1, 4+1))
Out[76]: 0.93750000000000022

and this for at least one item if we have two sets of groups, so AaBb:

In [77]: sum(b.pmf(x) for x in range(1, 4+1))**2
Out[77]: 0.87890625000000044

Which is completely different from the answer in the original question: 0.684

Where did I make a mistake? (if possible please only point out the mistake, rather than give a solution, so that there are no spoilers left for people trying to solve it on their own)

like image 255
viraptor Avatar asked Nov 13 '22 11:11

viraptor


1 Answers

I first followed your example and thought it seemed to make sense, but after a while I found where the problem was.

Here is a pointer to your mistake:

You have calculated the probability of getting at least one Aa-- in the second generetion and at least one --Bb. But this is not enough to find out if there is at least one AaBb in the second generation, the Aa-- and --Bb have to coincide.

Consider for example the following second generation: aaBb, AABb, Aabb, AaBB All individuals have either Aa-- or --Bb but there are no AaBb in the generation.

like image 106
user1884905 Avatar answered Nov 15 '22 08:11

user1884905