I named it as "AI way" because I'm thinking make Application to play the hangman game without human being interactive.
The scenario is like this:
I tried the strategy mentioned at wiki page but it does not work well. Basically successful rate is about 30%.
Any suggestions / comments regarding strategy as well as which field I should dig in order to find a fair good strategy?
Thanks a lot.
-Simon
PS: A JavaScript implementation which looks fairly well. (https://github.com/freizl/play-hangman-game)
Updated Idea
For posterity:
Take a look at this wiki page. It includes a table of frequencies of the first letters of words which may help you tune your algorithm.
You could also take into account the fact that if you find a vowel or two in a word the likelihood of finding other vowels will decrease significantly and you should then try more common consonants instead. The example from the wiki page you listed start with E then T and then tries three vowels in a row: A, O and I. The first two letters are missed but once the third letter is found, twice then the process should switch to common consonants and skip trying for more vowels since there will likely be fewer.
Any useful strategies will certainly employ frequency distribution charts on letters and possibly words e.g. some words are very common while others are rarely used so performing a letter frequency distribution on a set of more common words might help... guessing that some words may appear more frequently than other but that depends on your word selection algorithm which might not take into account "common" usage.
You could also build specialized letter frequency tables and possibly even on-the-fly. For example, given the wikipedia h a ngm a n example: You find the letter A twice in a word in two locations 2nd and 6th. You know that the word has seven letters and with a fairly simple reg ex you could isolate the words from a dictionary that match this pattern:
_ a _ _ _ a _
Then perform a letter frequency on that set of words that matches this pattern and use that set for your next guess. Rinse and repeat. I think doing some of those things I mentioned but especially the last will really increase your odds of success.
The strategies in the linked page seem to be "order guesses by letter frequency" and "guess the vowels, then order guesses by letter frequency"
A couple of observations about hangman:
1) Since guessing a letter that isn't in the word hurts us, we should guess letters by word frequency (percentage of words that contain letter X), not letter frequency (number of times that X appears in all words). This should maximise our chances of guessing a bad letter.
2) Once we've guessed some letters correctly, we know more about the word we're trying to guess.
Here are two strategies that should beat the letter frequency strategy. I'm going to assume we have a dictionary of words that might come up.
1) We know the length of the target word, n
. Remove all words in the dictionary that aren't of length n
2) Calculate the word frequency of all letters in the dictionary
3) Guess the most frequent letter that we haven't already guessed.
4) If we guessed correctly, remove all words from the dictionary that don't match the revealed letters.
5) If we guessed incorrectly, remove all words that contain the incorrectly guessed letter
6) Go to step 2
For maximum effect, instead of calculating word frequencies of all letters in step 2, calculate the word frequencies of all letters in positions that are still blank in the target word.
1) From the dictionary, build up a table of n-grams
for some value of n (say 2). If you haven't come across n-grams before, they are groups of consecutive letters inside the word. For example, if the word is "word"
, the 2-grams are {^w,wo,or,rd,d$}
, where ^
and $
mark the start and the end of the word. Count the word frequency of these 2-grams.
2) Start by guessing single letters by word frequency as above
3) Once we've had some hits, we can use the table of word frequency of n-grams to determine either letters to eliminate from our guesses, or letters that we're likely to be able to guess. There are a lot of ways you could achieve this:
For example, you could use 2-grams to determine that the blank in w_rd
is probably not z
. Or, you could determine that the character at the end of the word ___e_
might (say) be d
or s
.
Alternatively you could use the n-grams to generate the list of possible characters (though this might be expensive for long words). Remember that you can always cross off all n-grams that contain letters you've guessed that aren't in the target word.
Remember that at each step you're trying not to make a wrong guess, since that keeps us alive. If the n-grams tell you that one position is likely to only be (say) a,b or c, and your word frequency table tells you that a appears in 30% of words, but b and c only appear in 10%, then guess a
.
For maximum benefit, you could combine the two strategies.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With