Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimal Algorithm for Winning Hangman

In the game Hangman, is it the case that a greedy letter-frequency algorithm is equivalent to a best-chance-of-winning algorithm?

Is there ever a case where it's worth sacrificing preservation of your remaining lives, for the sake of a better chance of guessing the correct answer?

Further clarification of the problem:

  • The selected word to be guessed has been taken from a known dictionary.
  • You are given N lives, and thus have to maximise the probability of guessing all the letters in the word without making N mistakes (i.e. you may have an unlimited number of correct guesses).
  • Each word in the dictionary has equal probability, for the sake of this exercise (i.e. the word is chosen randomly)
    • a harder exercise would be to come up with a strategy against a malicious, omniscient word-chooser (I am not asking that here)

Motivation: This question is inspired by the interesting discussion at http://www.datagenetics.com/blog/april12012/index.html

They suggest an algorithm for solving the word game "Hangman" optimally.

Their strategy can be summarised thus (edited for clarification):

  • We can assume the word is drawn from a particular dictionary
  • We know the number of letters in the word
  • Eliminate all words in the dictionary that do not have the correct number of letters.
  • Choose the not-yet-guessed letter which occurs in the largest number of words in the remaining subset of the dictionary.
  • If this letter occurs, we know its location.
  • If this letter does not occur, we know it does not occur in the word.
  • Eliminate all words in the dictionary subset that do not fit exactly this correct pattern, and repeat.
  • If there are 2 (or more) letters appearing equally often, the algorithm may perform a deeper analysis of the positions to determine which one is preferred (if that is reasonable?)

At each stage, we are guessing the letter (not previously guessed) which occurs in the largest number of remaining possible words.

There is some motivation to like this algorithm - we are always minimally likely to lose a life.

But, it strikes me that this isn't necessarily the best solution: if we're trying to guess the word (within a certain number of lives), it's not necessarily always the case that the most frequently occurring letter is the most useful letter to distinguish between the remaining available words.

I'm not sure, though, as it does seem opportune to avoid losing a life wherever possible. Will it ever be the case that optimal strategy permits us to sacrifice a life for a better opportunity to win?

Question: is it the case that this greedy algorithm is equivalent to a best-chance-of-winning algorithm, or not? And can you prove it?

An example dictionary+game would be ideal to show a disproof.

like image 367
Ronald Avatar asked Mar 30 '12 12:03

Ronald


People also ask

What is hangman algorithm?

We select the most frequent letter from the selected subset of words and guess about the given letter. For every guesses the guessed letter will be marked as guessed and they won't be guessed again in the future. This way you have much better chances of survival than in the greedy algorithm described in your question.

What is the hardest word to solve in hangman?

The hardest word to guess in hangman, according to science, is: Jazz. Composed of 75 percent uncommon letters (J and Z) and allowing only three chances at picking correctly, jazz is the perfect storm of Hangman trickery. It's kind of fitting, really. They say jazz is all about the notes they don't play.

What are the most common letters in hangman?

The sequence above represents the usage order of letters in the English language, with the letter 'E' being the most common letter, followed by the letter 'T', all the way down to the letter 'Z', the least commonly used. So, the first letter we should guess when trying to solve a hangman is the letter 'E', right?


1 Answers

Assume the following dictionary: ABC ABD AEF EGH. (I'll capitalize unguessed letters.)
Assume you have only 1 life (makes the proof so much easier...).

The algorithm proposed above:

Starting with A, you lose (1/4) or are left with aBC aBD aEF (3/4).
Now guess B and lose (1/3) or are left with abC abD (2/3).
Now guess C or D and you lose (1/2) or win (1/2).
Probability to win: 3/4 * 2/3 * 1/2 = 1/4.

Now try something else:

Starting with E, you lose (1/2) or are left with AeF eGH (1/2).
Now you know what to guess and win.
Probability to win: 1/2.

Clearly the proposed algorithm is not optimal.

like image 123
Reinstate Monica Avatar answered Oct 01 '22 18:10

Reinstate Monica