I am writing a program that attempts to duplicate the algorithm discussed at the beginning of this article,
http://www-stat.stanford.edu/~cgates/PERSI/papers/MCMCRev.pdf
F is a function from char to char. Assume that Pl(f) is a 'plausibility' measure of that function. The algorithm is:
Starting with a preliminary guess at the function, say f, and a then new function f* --
I am implementing this using the following code. I'm using c# but tried to make it somewhat more simplified for everyone. If there is a better forum for this please let me know.
var current_f = Initial(); // current accepted function f
var current_Pl_f = InitialPl(); // current plausibility of accepted function f
for (int i = 0; i < 10000; i++)
{
var candidate_f = Transpose(current_f); // create a candidate function
var candidate_Pl_f = ComputePl(candidate_f); // compute its plausibility
if (candidate_Pl_f > current_Pl_f) // candidate Pl has improved
{
current_f = candidate_f; // accept the candidate
current_Pl_f = candidate_Pl_f;
}
else // otherwise flip a coin
{
int flip = Flip();
if (flip == 1) // heads
{
current_f = candidate_f; // accept it anyway
current_Pl_f = candidate_Pl_f;
}
else if (flip == 0) // tails
{
// what to do here ?
}
}
}
My question is basically whether this looks like the optimal approach to implementing that algorithm. IT seems like I may be getting stuck in some local maxima / local minima despite implementing this method.
EDIT - Here is the essentially whats behind Transpose() method. I use a dictionary / hash table of type << char, char >> that the candidate function(s) use to look any given char -> char transformation. So the transpose method simply swaps two values in the dictionary that dictates the behavior of the function.
private Dictionary<char, char> Transpose(Dictionary<char, char> map, params int[] indices)
{
foreach (var index in indices)
{
char target_val = map.ElementAt(index).Value; // get the value at the index
char target_key = map.ElementAt(index).Key; // get the key at the index
int _rand = _random.Next(map.Count); // get a random key (char) to swap with
char rand_key = map.ElementAt(_rand).Key;
char source_val = map[rand_key]; // the value that currently is used by the source of the swap
map[target_key] = source_val; // make the swap
map[rand_key] = target_val;
}
return map;
}
Keep in mind that a candidate function that uses the underlying dictionary is basically just:
public char GetChar(char in, Dictionary<char, char> theMap)
{
return theMap[char];
}
And this is the function that computes Pl(f):
public decimal ComputePl(Func<char, char> candidate, string encrypted, decimal[][] _matrix)
{
decimal product = default(decimal);
for (int i = 0; i < encrypted.Length; i++)
{
int j = i + 1;
if (j >= encrypted.Length)
{
break;
}
char a = candidate(encrypted[i]);
char b = candidate(encrypted[j]);
int _a = GetIndex(_alphabet, a); // _alphabet is just a string/char[] of all avl chars
int _b = GetIndex(_alphabet, b);
decimal _freq = _matrix[_a][_b];
if (product == default(decimal))
{
product = _freq;
}
else
{
product = product * _freq;
}
}
return product;
}
Tentatively, codereview.stackexchange.com may be a "better forum for this".
Never the less, I'll take a quick stab at it:
Discussion regarding the algorithm per-se and its applicability to different problems.
In a nutshell the algorithm is a guided stochastic search, whereby the [huge] solution space is sampled with two random devices: the Transpose() method which modifies (very slightly each time) the current candidate function and the Flip() method which decides whether a [locally] suboptimal solution should survive. The search is guided by an objective function, ComputePl() which is itself based on a matrix of first order transition in some reference corpus.
In this context, local minima "tar pits" can be avoided by augmenting the probability of selecting "suboptimal" functions: rather than a fair 50-50 Flip(), maybe try with a probablity of retaining "suboptimal" solutions of 66% or even 75%. This approach typically extend the number of generations necessary to converge toward the optimal solution, but, as said may avoid getting stuck in local minima.
Another way of ensuring the applicability of the algorithm is to ensure a better evaluation of the plausibility of given functions. Likely explanations for the relative success and generality of the algorithm are
Therefore to improve the applicability of the algorithm to a given problem, ensure that the distribution matrix used matches as closely as possible the language and domain of the underlying text.
From the description in the article, your implementation seems correct (the part you mark as "what to do here" should indeed be nothing).
If you are having problems with local maxima (something the article claims the coin toss should avoid), make sure your implemenations of Initial, Transpose, ComputePl and Flip are correct.
You can also try making the coin tossed biased (increasing the probability of Flip() == 1 will make this closer to a random walk and less susceptible to getting stuck).
Here is a slightly tighter version of your code:
var current_f = Initial(); // current accepted function f
var current_Pl_f = ComputePl(current_f); // current plausibility of accepted function f
for (int i = 0; i < 10000; i++)
{
var candidate_f = Transpose(current_f); // create a candidate function
var candidate_Pl_f = ComputePl(candidate_f); // compute its plausibility
if (candidate_Pl_f > current_Pl_f || Flip() == 1)
{
// either candidate Pl has improved,
// or it hasn't and we flipped the coin in candidate's favor
// - accept the candidate
current_f = candidate_f;
current_Pl_f = candidate_Pl_f;
}
}
This algorithm seems to be related to http://en.wikipedia.org/wiki/Simulated_annealing. If that is the case, the behaviour might be helped by changing the probability with which you accept poorer alternatives to the current solution, especially if you reduce this probability over time.
Alternatively, you could try simple hill-climbing from multiple random starts - never accept poorer alternatives, which means that you will get stuck in local maxima more often, but repeatedly run the algorithm from different starts.
When you test this out, you usually know the right answer for your test problem. It is a good idea to compare the plausibility value at the right answer with those your algorithm is coming up with, just in case the weakness is in the plausibility formula, in which case your algorithm will come up with wrong answers which appear more plausible than the correct one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With