Best data structure for crossword puzzle search

Tags:

I have a large database for solving crossword puzzles, consisting of a word and a description. My application allows searching for words of a specific length and characters on specific positions (this is done the hard way ... go through all words and check each). Plus a search by description (if necessary)

For instance find word _ _ A _ _ B (6 letter word, third character A and last B)

I would like to index the words in such way that the searching would be really fast. My first idea was to use a balanced tree structure, any other suggestion?

555

asked Feb 18 '10 13:02

Drejc

1 Answers

Okay, I am going to propose something weird, but coming from C++ I have been using Boost for a long time and I have come to see the MultiIndex library.

The idea of this library is to create one collection, but have many different ways to query it. It could model, in fact, a database.

So, let's put our words in a table, and put the necessary indexes in place:

Click to copy

word                     |length|c0|c1|c2| ... |c26|
-------------------------|------|--|--|--| ... |---|
Singapour                |9     |S |i |n | ... |0  |

Now the query will look like:

Click to copy

Select word From table Where length=9 And c2='n' And c8='u';

Easy enough isn't it ?

For maximum efficiency, the table should be partitioned on the length, and the indexes (one per cX column) should be local to the partition.

For an in-memory solution you would have one container per length, containing as many indexes as the length, each index being a hash table pointing to a sorted list (easier merge)

Here is a python description:

Click to copy

class Dictionary:
  def __init__(self, length):
    self.length = length
    self.words = set([])
    self.indexes = collections.defaultdict(set)

  def add(self, word):
    if len(word) != self.length:
      raise RuntimeException(word + ' is not ' + `self.length` + ' characters long')

    if word in self.words:
      raise RuntimeException(word + ' is already in the dictionary')

    self.words.add(word)

    for i in range(0,length):
      self.indexes[(i,word[i])].add(word)

  def search(self, list):
    """list: list of tuples (position,character)
    """
    def compare(lhs,rhs): return cmp(len(lhs),len(rhs))

    sets = [self.indexes[elem] for elem in list]
    sets.sort(compare)
    return reduce(intersection, sets)

I have voluntarily provided the length argument, to minimize the size of the hashes and thus make the search better. Also, the sets are sorted by length so that the computation of the intersection be better :)

Go ahead and test it against other solutions if you wish :)

100

answered Nov 04 '22 02:11

Matthieu M.

Related questions
                            
                                generate random numbers within a range with different probabilities
                            
                                Looping Algorithm
                            
                                efficient algorithm instead of looping
                            
                                Points cloud Triangulation algorithm
                            
                                Get an O(N) algorithm to find a product of a collection of numbers with a strange constraint
                            
                                Infix to postfix algorithm that takes care of unary operators
                            
                                Determine if a given weighted graph has unique MST
                            
                                Combine nearby polygons
                            
                                How do I efficiently remove_if only a single element from a forward_list?
                            
                                Minimum number of bits to represent a given `int`
                            
                                Efficient way to compare two arrays
                            
                                Computational complexity of Gram-Schmidt orthogonalization algorithm
                            
                                Fastest algorithm to find if a BigInteger is a prime number or not? [duplicate]
                            
                                Weird situation in prime number checking code
                            
                                How does Xor work in swapping values?
                            
                                Encode String to Base36
                            
                                Trying to optimize my code to either remove nested loop or make it more efficient
                            
                                Delete elements in a std::vector<std::string> which matches with the characters in another given std::string
                            
                                Radix Sort implemented in C++
                            
                                Programming experiments

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best data structure for crossword puzzle search

Tags:

algorithm

indexing

b-tree

Drejc

People also ask

1 Answers

Matthieu M.

Recent Activity

Donate For Us