Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find a word from arrays of characters?

What is the best way to solve this:

I have a group of arrays with 3-4 characters inside each like so:

{p,     {a,    {t,    {m,
 q,      b,     u,     n,
 r,      c      v      o
 s      }      }      }
}

I also have an array of dictionary words.

What is the best/fastest way to find if the array of characters can combine to form one of the dictionary words? For example, the above arrays could make the words:

"pat","rat","at","to","bum"(lol)
but not "nub" or "mat"

Should i loop through the dictionary to see if words can be made or get all the combinations from the letters then compare those to the dictionary

like image 791
GeeGoldz Avatar asked May 16 '11 20:05

GeeGoldz


3 Answers

I had some Scrabble code laying around, so I was able to throw this together. The dictionary I used is sowpods (267751 words). The code below reads the dictionary as a text file with one uppercase word on each line.

The code is C#:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Diagnostics;

namespace SO_6022848
{
  public struct Letter
  {
    public const string Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    public static implicit operator Letter(char c)
    {
      return new Letter() { Index = Chars.IndexOf(c) };
    }
    public int Index;
    public char ToChar()
    {
      return Chars[Index];
    }
    public override string ToString()
    {
      return Chars[Index].ToString();
    }
  }

  public class Trie
  {
    public class Node
    {
      public string Word;
      public bool IsTerminal { get { return Word != null; } }
      public Dictionary<Letter, Node> Edges = new Dictionary<Letter, Node>();
    }

    public Node Root = new Node();

    public Trie(string[] words)
    {
      for (int w = 0; w < words.Length; w++)
      {
        var word = words[w];
        var node = Root;
        for (int len = 1; len <= word.Length; len++)
        {
          var letter = word[len - 1];
          Node next;
          if (!node.Edges.TryGetValue(letter, out next))
          {
            next = new Node();
            if (len == word.Length)
            {
              next.Word = word;
            }
            node.Edges.Add(letter, next);
          }
          node = next;
        }
      }
    }

  }

  class Program
  {
    static void GenWords(Trie.Node n, HashSet<Letter>[] sets, int currentArrayIndex, List<string> wordsFound)
    {
      if (currentArrayIndex < sets.Length)
      {
        foreach (var edge in n.Edges)
        {
          if (sets[currentArrayIndex].Contains(edge.Key))
          {
            if (edge.Value.IsTerminal)
            {
              wordsFound.Add(edge.Value.Word);
            }
            GenWords(edge.Value, sets, currentArrayIndex + 1, wordsFound);
          }
        }
      }
    }

    static void Main(string[] args)
    {
      const int minArraySize = 3;
      const int maxArraySize = 4;
      const int setCount = 10;
      const bool generateRandomInput = true;

      var trie = new Trie(File.ReadAllLines("sowpods.txt"));
      var watch = new Stopwatch();
      var trials = 10000;
      var wordCountSum = 0;
      var rand = new Random(37);

      for (int t = 0; t < trials; t++)
      {
        HashSet<Letter>[] sets;
        if (generateRandomInput)
        {
          sets = new HashSet<Letter>[setCount];
          for (int i = 0; i < setCount; i++)
          {
            sets[i] = new HashSet<Letter>();
            var size = minArraySize + rand.Next(maxArraySize - minArraySize + 1);
            while (sets[i].Count < size)
            {
              sets[i].Add(Letter.Chars[rand.Next(Letter.Chars.Length)]);
            }
          }
        }
        else
        {
          sets = new HashSet<Letter>[] { 
          new HashSet<Letter>(new Letter[] { 'P', 'Q', 'R', 'S' }), 
          new HashSet<Letter>(new Letter[] { 'A', 'B', 'C' }), 
          new HashSet<Letter>(new Letter[] { 'T', 'U', 'V' }), 
          new HashSet<Letter>(new Letter[] { 'M', 'N', 'O' }) };
        }

        watch.Start();
        var wordsFound = new List<string>();
        for (int i = 0; i < sets.Length - 1; i++)
        {
          GenWords(trie.Root, sets, i, wordsFound);
        }
        watch.Stop();
        wordCountSum += wordsFound.Count;
        if (!generateRandomInput && t == 0)
        {
          foreach (var word in wordsFound)
          {
            Console.WriteLine(word);
          }
        }
      }
      Console.WriteLine("Elapsed per trial = {0}", new TimeSpan(watch.Elapsed.Ticks / trials));
      Console.WriteLine("Average word count per trial = {0:0.0}", (float)wordCountSum / trials);
    }

  }

}

Here is the output when using your test data:

PA
PAT
PAV
QAT
RAT
RATO
RAUN
SAT
SAU
SAV
SCUM
AT
AVO
BUM
BUN
CUM
TO
UM
UN
Elapsed per trial = 00:00:00.0000725
Average word count per trial = 19.0

And the output when using random data (does not print each word):

Elapsed per trial = 00:00:00.0002910
Average word count per trial = 62.2

EDIT: I made it much faster with two changes: Storing the word at each terminal node of the trie, so that it doesn't have to be rebuilt. And storing the input letters as an array of hash sets instead of an array of arrays, so that the Contains() call is fast.

like image 152
Fantius Avatar answered Sep 21 '22 20:09

Fantius


There are probably many way of solving this.

What you are interested in is the number of each character you have available to form a word, and how many of each character is required for each dictionary word. The trick is how to efficiently look up this information in the dictionary.

Perhaps you can use a prefix tree (a trie), some kind of smart hash table, or similar.

Anyway, you will probably have to try out all your possibilities and check them against the dictionary. I.e., if you have three arrays of three values each, there will be 3^3+3^2+3^1=39 combinations to check out. If this process is too slow, then perhaps you could stick a Bloom filter in front of the dictionary, to quickly check if a word is definitely not in the dictionary.

EDIT: Anyway, isn't this essentially the same as Scrabble? Perhaps try Googling for "scrabble algorithm" will give you some good clues.

like image 26
csl Avatar answered Sep 19 '22 20:09

csl


The reformulated question can be answered just by generating and testing. Since you have 4 letters and 10 arrays, you've only got about 1 million possible combinations (10 million if you allow a blank character). You'll need an efficient way to look them up, use a BDB or some sort of disk based hash.

The trie solution previously posted should work as well, you are just restricted more by what characters you can choose at each step of the search. It should be faster as well.

like image 45
dfb Avatar answered Sep 19 '22 20:09

dfb