Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Another Permutation Word Conundrum... With Linq?

I have seen many examples of getting all permutations of a given set of letters. Recursion seems to work well to get all possible combinations of a set of letters (though it doesn't seem to take into account if 2 of the letters are the same).

What I would like to figure out is, could you use linq (or not) to get all possible combinations of letters down to 3 letter combinations.

For example, given the letters: P I G G Y I want an array of all possible cominations of these letters so I can check against a word list (scrabble?) and eventually get a list of all possible words that you can make using those letters (from 3 letters up to the total, in this case 5 letters).

like image 810
CraigF Avatar asked Dec 22 '22 20:12

CraigF


1 Answers

I would suggest that rather than generating all possible permutations (of each desired length), take slightly different approach that will reduce the overall amount of work that you have to do.

First, find some word lists (you say that you are going to check against a word list).

Here is a good source of word lists:

http://www.poslarchive.com/math/scrabble/lists/index.html

Next, for each word list (e.g. for 3 letter words, 4 letter words, etc), build a dictionary whose key is the letters of the word in alphabetical order, and whose value is the word. For example, given the following word list:

ACT
CAT
ART
RAT
BAT
TAB

Your dictionary would look something like this (conceptually) (You might want to make a dictionary of List):

ABT - BAT, TAB
ACT - ACT, CAT
ART - ART, RAT, TAR

You could probably put all words of all lengths in the same dictionary, it is really up to you.

Next, to find candidate words for a given set of N letters, generate all possible combinations of length K for the lengths that you are interested in. For scrabble, that would all combinations (order is not important, so CAT == ACT, so all permutations is not required) of 2 (7 choose 2), 3 (7 choose 3), 4 (7 choose 4), 5 (7 choose 5), 6 (7 choose 6), 7 letters (7 choose 7). This can be improved by first ordering the N letters alphabetically and then finding the combinations of length K.

For each combination of length K, check the dictionary to see if there are any words with this key. If so, they are candidates to be played.

So, for CAKE, order the letters:

ACEK

Get the 2, 3, and 4 letter combinations:

AC
AE
AK
CE
CK
EK
ACE
CEK
ACEK

Now, use these keys into the dictionary. You will find ACE and CAKE are candidates.

This approach allows you to be much more efficient than generating all permutations and then checking each to see if it is a word. Using the combination approach, you do not have to do separate lookups for groups of letters of the same length with the same letters.

For example, given:

TEA

There are 6 permutations (of length 3), but only 1 combination (of length 3). So, only one lookup is required, using the key AET.

Sorry for not putting in any code, but with these ideas, it should be relatively straightforward to achieve what you want.

I wrote a program that does a lot of this back when I was first learning C# and .NET. I will try to post some snippets (improved based on what I have learned since then).

This string extension will return a new string that represents the input string's characters reassembled in alphabetical order:

public static string ToWordKey(this string s)
{
  return new string(s.ToCharArray().OrderBy(x => x).ToArray());
}

Based on this answer by @Adam Hughes, here is an extension method that will return all combinations (n choose k, not all permutations) for all lengths (1 to string.Length) of the input string:

public static IEnumerable<string> Combinations(this String characters)
{
  //Return all combinations of 1, 2, 3, etc length
  for (int i = 1; i <= characters.Length; i++)
  {
    foreach (string s in CombinationsImpl(characters, i))
    {
      yield return s;
    }
  }
}

//Return all combinations (n choose k, not permutations) for a given length
private static IEnumerable<string> CombinationsImpl(String characters, int length)
{
  for (int i = 0; i < characters.Length; i++)
  {
    if (length == 1)
    {
      yield return characters.Substring(i,1);
    }
    else
    {
      foreach (string next in CombinationsImpl(characters.Substring(i + 1, characters.Length - (i + 1)), length - 1))
        yield return characters[i] + next;
    }
  }
}

Using the "InAlphabeticOrder" method, you can build a list of your input words (scrabble dictionary), indexed by their "key" (similar to dictionary, but many words could have the same key).

public class WordEntry
{
  public string Key { set; get; }
  public string Word { set; get; }

  public WordEntry(string w)
  {
    Word = w;
    Key = Word.ToWordKey();
  }
}

var wordList = ReadWordsFromFileIntoWordEntryClasses();

Given a list of WordEntry, you can query the list using linq to find all words that can be made from a given set of letters:

string lettersKey = letters.ToWordKey();

var words = from we in wordList where we.Key.Equals(lettersKey) select we.Word;

You could find all words that could be made from any combination (of any length) of a given set of letters like this:

string lettersKey = letters.ToWordKey();

var words = from we in wordList
            from key in lettersKey.Combinations()
            where we.Key.Equals(key)
            select we.Word;

[EDIT]

Here is some more sample code:

Given a list of 2, 3, and 4 letter words from here: http://www.poslarchive.com/math/scrabble/lists/common-234.html

Here is some code that will read those words (I cut and pasted them into a txt file) and construct a list of WordEntry objects:

private IEnumerable<WordEntry> GetWords()
{
  using (FileStream fs = new FileStream(@".\Words234.txt", FileMode.Open))
  using (StreamReader sr = new StreamReader(fs))
  {
    var words = sr.ReadToEnd().Split(new char[] { ' ', '\n' }, StringSplitOptions.RemoveEmptyEntries);
    var wordLookup = from w in words select new WordEntry(w, w.ToWordKey());
    return wordLookup;
  }
}

I have renamed the InAlphateticalOrder extension method to ToWordKey.

Nothing fancy here, just read the file, split it into words, and create a new WordEntry for each word. Possibly could be more efficient here by reading one line at a time. The list will also get pretty long when you consider 5, 6, and 7 letter words. That might be an issue and it might not. For a toy or a game, it is probably no big deal. If you wanted to get fancy, you might consider building a small database with the words and keys.

Given a set of letters, find all possible words of the same length as the key:

  string key = "cat".ToWordKey();
  var candidates = from we in wordEntries 
                   where we.Key.Equals(key,StringComparison.OrdinalIgnoreCase) 
                   select we.Word;

Given a set of letters, find all possible words from length 2 to length(letters)

string letters = "seat";

IEnumerable<string> allWords = Enumerable.Empty<string>();

//Get each combination so that the combination is in alphabetical order
foreach (string s in letters.ToWordKey().Combinations())
{
  //For this combination, find all entries with the same key
  var words = from we in wordEntries 
              where we.Key.Equals(s.ToWordKey(),StringComparison.OrdinalIgnoreCase) 
              select we.Word;
  allWords = allWords.Concat(words.ToList());
}

This code could probably be better, but it gets the job done. One thing that it does not do is handle duplicate letters. If you have "egg", the two letter combinations will be "eg", "eg", and "gg". That can be fixed easily enough by adding a call to Distinct to the foreach loop:

//Get each combination so that the combination is in alphabetical order
//Don't be fooled by words with duplicate letters...
foreach (string s in letters.ToWordKey().Combinations().Distinct())
{
  //For this combination, find all entries with the same key
  var words = from we in wordEntries 
              where we.Key.Equals(s.ToWordKey(),StringComparison.OrdinalIgnoreCase)
              select we.Word;
  //I forced the evaluation here because without ToList I was only capturing the LAST 
  //(longest) combinations of letters.
  allWords = allWords.Concat(words.ToList());
}

Is that the most efficient way to do it? Maybe, maybe not. Somebody has to do the work, why not LINQ?

I think that with this approach you probably don't need a Dictionary of Lists (Dictionary<string,List<string>>).

With this code and with a suitable set of words, you should be able to take any combination of letters and find all words that can be made from them. You can control the words by finding all words of a particular length, or all words of any length.

This should get you on your way.

[More Clarification]

In terms of your original question, you take as input "piggy" and you want to find all possible words that can be made from these letters. Using the Combinations extension method on "piggy", you will come up with a list like this:

p
i
g
g
y
pi
pg
pg
py
ig
ig
iy
gg
gy
gy
pig
pig
piy

etc. Note that there are repetitions. That is ok, the last bit of sample code that I posted showed how to find all unique Combinations by applying the Distinct operator.

So, we can get a list of all combinations of letters from a given set of letters. My algorithm depends on the list of WordEntry objects being searchable based on the Key property. The Key property is simply the letters of the word rearranged into alphabetical order. So, if you read a word file containing words like this:

ACT
CAT
DOG
GOD
FAST
PIGGY

The list of WordEntry objects will look like this:

Word  Key

ACT   ACT
CAT   ACT
DOG   DGO
GOD   DGO
FAST  AFST
PIGGY GGIPY

So, it's easy enough to build up the list of words and keys that we want to test against (or dictionary of valid scrabble words).

For example, (assume the few words above form your entire dictionary), if you had the letters 'o' 'g' 'd' on your scrabble tray, you could form the words DOG and GOD, because both have the key DGO.

Given a set of letters, if we want to find all possible words that can be made from those letters, we must be able to generate all possible combinations of letters. We can test each of these against the "dictionary" (quotes because it is not REALLY a Dictionary in the .NET sense, it is a list (or sequence) of WordEntry objects). To make sure that the keys (from the sequence of letters that we have "drawn" in scrabble) is compatible with the Key field in the WordEntry object, we must first order the letters.

Say we have PIGGY on our scrabble tray. To use the algorithm that I suggested, we want to get all possible "Key" values from PIGGY. In our list of WordEntry objects, we created the Key field by ordering the Word's letters in alphabetic order. We must do the same with the letters on our tray.

So, PIGGY becomes GGIPY. (That is what ToWordKey does). Now, given the letters from our tray in alphabetical order, we can use Combinations to generate all possible combinations (NOT permumations). Each combination we can look up in our list, based on Key. If a combination from GGIPY matches a Key value, then the corresponding Word (of the WordEntry class) can be constructed from our letters.

A better example than PIGGY

SEAT

First use ToWordKey:

AETS

Now, make all Combinations of all lengths:

A
E
T
S
AE
AT
AS
ET
ES
TS
AET
ATS
ETS
AETS

When we look in our list of WordEntry objects (made from reading in the list of 2, 3, 4 letter words), we will probably find that the following combinations are found:

AT
AS
AET
ATS
ETS
AETS

These Key values correspond to the following words:

Key   Word
AT    AT
AS    AS
AET   ATE
AET   EAT
AET   TEA
AST   SAT
EST   SET
AEST  EATS
AEST  SEAT
AEST  TEAS

The final code example above will take the letters ('s' 'e' 'a' 't'), convert to Key format (ToWordKey) generate the combinations (Combinations), keep only the unique possible key values (Distict - not an issue here since no repeated letters), and then query the list of all WordEntry objects for those WordEntry objects whose Key is the same as one of the combinations.

Essentially, what we have done is constructed a table with columns Word and Key (based on reading the file of words and computing the Key for each) and then we do a query joining Key with a sequence of Key values that we generated (from the letters on our tray).

Try using my code in steps.

First, use the Combinations extension method:

var combinations = "piggy".Combinations();

Print the result (p i g g y ... pi pg pg ... pig pig piy ... pigg pigy iggy ... etc)

Next, get all combinations after applying the ToWordKey extension method:

//
// "piggy".ToWordKey() yields "iggpy"
//
var combinations = "piggy".ToWordKey().Combinations();

Print the result (i g g p y ig ig ip iy igg igp igy ... etc)

Eliminate duplicates with the Distinct() method:

var combinations = "piggy".ToWordKey().Combinations().Distinct();

Print the result (i g p y ig ip iy igg igp igy ... etc)

Use other sets of letters like "ate" and "seat".

Notice that you get significantly fewer candidates than if you use a permutation algorithm.

Now, imagine that the combinations that we just made are the key values that we will use to look in our list of WordEntry objects, comparing each combination to the Key of a WordEntry.

Use the GetWords function above and the link to the 2, 3, 4 letter words to build the list of WordEntry objects. Better yet, make a very stripped down word list with only a few words and print it out (or look at it in the debugger). See what it looks like. Look at each Word and each Key. Now, imagine if you wanted to find ALL words that you could make with "AET". It is easier to imagine using all letters, so start there. There are 6 permutations, but only 1 combination! That's right, you only have to make one search of the word list to find all 3 letter words that can be made with those letters! If you had 4 letters there would be 24 permutations, but again, only 1 combination.

That is the essence of the algorithm. The ToWordKey() function is essentially a hash function. All strings with the same number of letters and the same set of letters will hash to the same value. So, build a list of Words and their hashes (Key - ToWordKey) and then, given a set of letters to use to make words, hash the letters (ToWordKey) and find all entries in the list with the same hash value. To extend to finding all words of any length (given a set of letters), you just have to hash the input (send the whole string through ToWordKey), then find all Combinations of ANY length. Since the combinations are being generated from the hashed set of letters AND since the Combinations extension method maintains the original ordering of the letters in each combination, then each combination retains the property of having been hashed! That's pretty cool!

Hope this helps.

like image 56
13 revs Avatar answered Dec 28 '22 08:12

13 revs