What I’m trying to do:
How I’m trying to do this:
My current plan:
Having a plan is no good if you can’t execute it, this is what I need help with:
Finally:
Any help is very much appreciated, I’m still a beginner with C# and MySQL so please be gentle
Thank you a lot!
First off, let's look at the constraints on the problem. You want to store a word list for a game in a data structure that efficiently supports the "anagram" problem. That is, given a "rack" of n letters, what are all the n-or-fewer-letter words in the word list that can be made from that rack. the word list will be about 400K words, and so is probably about one to ten megs of string data when uncompressed.
A trie is the classic data structure used to solve this problem because it combines both memory efficiency with search efficiency. With a word list of about 400K words of reasonable length you should be able to keep the trie in memory. (As opposed to going with a b-tree sort of solution where you keep most of the tree on disk because it is too big to fit in memory all at once.)
A trie is basically nothing more than a 26-ary tree (assuming you're using the Roman alphabet) where every node has a letter and one additional bit on each node that says whether it is the end of the word.
So let's sketch the data structure:
class TrieNode
{
char Letter;
bool IsEndOfWord;
List<TrieNode> children;
}
This of course is just a sketch; you'd probably want to make these have proper property accessors and constructors and whatnot. Also, maybe a flat list is not the best data structure; maybe some sort of dictionary is better. My advice is to get it working first, and then measure its performance, and if it is unacceptable, then experiment with making changes to improve its performance.
You can start with an empty trie:
TrieNode root = new TrieNode('^', false, new List<TrieNode>());
That is, this is the "root" trie node that represents the beginning of a word.
How do you add the word "AA", the first word in the Scrabble dictionary? Well, first make a node for the first letter:
root.Children.Add('A', false, new List<TrieNode>());
OK, our trie is now
^
|
A
Now add a node for the second letter:
root.Children[0].Children.Add(new trieNode('A', true, new List<TrieNode>()));
Our trie is now
^
|
A
|
A$ -- we notate the end of word flag with $
Great. Now suppose we want to add AB. We already have a node for "A", so add to it the "B$" node:
root.Children[0].Children.Add(new trieNode('B', true, new List<TrieNode>());
and now we have
^
|
A
/ \
A$ B$
Keep on going like that. Of course, rather than writing "root.Children[0]..." you'll write a loop that searches the trie to see if the node you want exists, and if not, create it.
To store your trie on disk -- frankly, I would just store the word list as a plain text file and rebuild the trie when you need to. It shouldn't take more than 30 seconds or so, and then you can re-use the trie in memory. If you do want to store the trie in some format that is more like a trie, it shouldn't be hard to come up with a serialization format.
To search the trie for matching a rack, the idea is to explore every part of the trie, but to prune out the areas where the rack cannot possibly match. If you haven't got any "A"s on the rack, there is no need to go down any "A" node. I sketched out the search algorithm in your previous question.
I've got an implementation of a functional-style persistent trie that I've been meaning to blog about for a while but never got around to it. If I do eventually post that I'll update this question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With