Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for computing the plausibility of a function / Monte Carlo Method

I am writing a program that attempts to duplicate the algorithm discussed at the beginning of this article,

http://www-stat.stanford.edu/~cgates/PERSI/papers/MCMCRev.pdf

F is a function from char to char. Assume that Pl(f) is a 'plausibility' measure of that function. The algorithm is:

Starting with a preliminary guess at the function, say f, and a then new function f* --

  • Compute Pl(f).
  • Change to f* by making a random transposition of the values f assigns to two symbols.
  • Compute Pl(f*); if this is larger than Pl(f), accept f*.
  • If not, flip a Pl(f)/Pl(f*) coin; if it comes up heads, accept f*.
  • If the coin toss comes up tails, stay at f.

I am implementing this using the following code. I'm using c# but tried to make it somewhat more simplified for everyone. If there is a better forum for this please let me know.

 var current_f = Initial();    // current accepted function f
 var current_Pl_f = InitialPl();  // current plausibility of accepted function f

 for (int i = 0; i < 10000; i++)
        {
            var candidate_f = Transpose(current_f); // create a candidate function

            var candidate_Pl_f = ComputePl(candidate_f);  // compute its plausibility

            if (candidate_Pl_f > current_Pl_f)            // candidate Pl has improved
            {
                current_f = candidate_f;            // accept the candidate
                current_Pl_f = candidate_Pl_f; 
            }
            else                                    // otherwise flip a coin
            {
                int flip = Flip(); 

                if (flip == 1)                      // heads
                {
                    current_f = candidate_f;        // accept it anyway
                    current_Pl_f = candidate_Pl_f; 
                }
                else if (flip == 0)                 // tails
                {
                    // what to do here ?
                }
            }
        }

My question is basically whether this looks like the optimal approach to implementing that algorithm. IT seems like I may be getting stuck in some local maxima / local minima despite implementing this method.

EDIT - Here is the essentially whats behind Transpose() method. I use a dictionary / hash table of type << char, char >> that the candidate function(s) use to look any given char -> char transformation. So the transpose method simply swaps two values in the dictionary that dictates the behavior of the function.

    private Dictionary<char, char> Transpose(Dictionary<char, char> map, params int[] indices)
    {
        foreach (var index in indices)
        {
            char target_val = map.ElementAt(index).Value;   // get the value at the index

            char target_key = map.ElementAt(index).Key;     // get the key at the index

            int _rand = _random.Next(map.Count);   // get a random key (char) to swap with

            char rand_key = map.ElementAt(_rand).Key;

            char source_val = map[rand_key]; // the value that currently is used by the source of the swap

            map[target_key] = source_val; // make the swap

            map[rand_key] = target_val;
        }

        return map; 
    }

Keep in mind that a candidate function that uses the underlying dictionary is basically just:

public char GetChar(char in, Dictionary<char, char> theMap)
{
     return theMap[char]; 
}

And this is the function that computes Pl(f):

    public decimal ComputePl(Func<char, char> candidate, string encrypted, decimal[][] _matrix)
    {
        decimal product = default(decimal);

        for (int i = 0; i < encrypted.Length; i++)
        {
            int j = i + 1;

            if (j >= encrypted.Length)
            {
                break;
            }

            char a = candidate(encrypted[i]);
            char b = candidate(encrypted[j]);

            int _a = GetIndex(_alphabet, a); // _alphabet is just a string/char[] of all avl chars 
            int _b = GetIndex(_alphabet, b);

            decimal _freq = _matrix[_a][_b]; 

            if (product == default(decimal))
            {
                product = _freq;
            }
            else
            {
                product = product * _freq;
            }
        }

        return product;
    }
like image 816
Sean Thoman Avatar asked Sep 14 '11 21:09

Sean Thoman


Video Answer


3 Answers

Tentatively, codereview.stackexchange.com may be a "better forum for this".
Never the less, I'll take a quick stab at it:

  • At a glance, the snippet shown is a correct implementation of the algorithm.
  • Whether the algorithm will "get stuck" in local minima is a issue pertaining to the algorithm not to the implementation. (see discussion below)
  • Your quest for an "optimal approach" seems to be directed at tweaks in the algorithm (deviation from the original algorithm) rather than at tweaks in the implementation (to make it faster and/or to eradicate some possible flaws). For considerations regarding the algorithm, see discussion below; for discussion regarding the implementation consider the following:
    • ensure the Flip() method is fair
    • ensure the ComputePl() is correct: some errors often arise because of issues with the arithmetic precision in the value function.
    • ensure fairness (equiprobability) with the Transpose() method.
    • Performance improvements would likely come from optimizations in the ComputePl() method (not shown) rather than in the main loop.

Discussion regarding the algorithm per-se and its applicability to different problems.
In a nutshell the algorithm is a guided stochastic search, whereby the [huge] solution space is sampled with two random devices: the Transpose() method which modifies (very slightly each time) the current candidate function and the Flip() method which decides whether a [locally] suboptimal solution should survive. The search is guided by an objective function, ComputePl() which is itself based on a matrix of first order transition in some reference corpus.

In this context, local minima "tar pits" can be avoided by augmenting the probability of selecting "suboptimal" functions: rather than a fair 50-50 Flip(), maybe try with a probablity of retaining "suboptimal" solutions of 66% or even 75%. This approach typically extend the number of generations necessary to converge toward the optimal solution, but, as said may avoid getting stuck in local minima.

Another way of ensuring the applicability of the algorithm is to ensure a better evaluation of the plausibility of given functions. Likely explanations for the relative success and generality of the algorithm are

  • The distribution of first order transitions in the English language is not very uniform (and hence models rather well the plausibility of a given function, by rewarding it for its matches and by punishing it for its mismatches). This kind of multi-dimensional statistic, is more resilient to departure from the reference corpus than say the "order 0" distribution of characters in a a given language/corpus.
  • The distribution of first order transitions while being language-specific, are generally similar across different languages and/or in the context of jargon vocabularies [based on said languages]. Things do break down in the case of short-hand jargons whereby letters such as vowals are typically omitted.

Therefore to improve the applicability of the algorithm to a given problem, ensure that the distribution matrix used matches as closely as possible the language and domain of the underlying text.

like image 87
mjv Avatar answered Oct 01 '22 14:10

mjv


From the description in the article, your implementation seems correct (the part you mark as "what to do here" should indeed be nothing).

If you are having problems with local maxima (something the article claims the coin toss should avoid), make sure your implemenations of Initial, Transpose, ComputePl and Flip are correct.

You can also try making the coin tossed biased (increasing the probability of Flip() == 1 will make this closer to a random walk and less susceptible to getting stuck).

Here is a slightly tighter version of your code:

var current_f = Initial();    // current accepted function f
var current_Pl_f = ComputePl(current_f);  // current plausibility of accepted function f

for (int i = 0; i < 10000; i++)
{
    var candidate_f = Transpose(current_f); // create a candidate function
    var candidate_Pl_f = ComputePl(candidate_f);  // compute its plausibility

    if (candidate_Pl_f > current_Pl_f  ||  Flip() == 1)
    {
        // either candidate Pl has improved,
        // or it hasn't and we flipped the coin in candidate's favor
        //  - accept the candidate
        current_f = candidate_f;            
        current_Pl_f = candidate_Pl_f; 
    }
}
like image 20
Stjepan Rajko Avatar answered Oct 01 '22 14:10

Stjepan Rajko


This algorithm seems to be related to http://en.wikipedia.org/wiki/Simulated_annealing. If that is the case, the behaviour might be helped by changing the probability with which you accept poorer alternatives to the current solution, especially if you reduce this probability over time.

Alternatively, you could try simple hill-climbing from multiple random starts - never accept poorer alternatives, which means that you will get stuck in local maxima more often, but repeatedly run the algorithm from different starts.

When you test this out, you usually know the right answer for your test problem. It is a good idea to compare the plausibility value at the right answer with those your algorithm is coming up with, just in case the weakness is in the plausibility formula, in which case your algorithm will come up with wrong answers which appear more plausible than the correct one.

like image 36
mcdowella Avatar answered Oct 01 '22 16:10

mcdowella