Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IOS: Sqlite. Find record fast

I'm building a scrabble game, and I'm validating the words against a wordDictionary.

In my first try, I loaded the dictionary into an array and did a binary search to validate.

Now I've switch to sqlite instead, so I don't have to have the whole dict in memory, and decrease loading time.

I have two challenges:

  1. What is the most efficient way to query the database to see if a word is in there?

  2. How do I find all possible word for a set of letters... When I had the wordDictionary in an array, I could just loop through the whole thing and validate each and every words. It would be very slow to query every row (~700,000) and validate with sqlite.

like image 928
BlackMouse Avatar asked Dec 05 '25 06:12

BlackMouse


2 Answers

The "obvious" solution is to build an index. However, if your binary search in memory did not work, I'm not so sure that an index is going to solve the problem. It will occupy about the same amount of memory.

Wouldn't it be nice if you could search for possible matches, get a handful in one fetch from outside memory, and then do the comparisons quickly?

This is possible with a database. The idea is to create a "hash" function. Everything with the same hash value would be stored in the words table. Then that would be fetched into memory for a search.

You could do the search yourself once you get the set of words with the same hash, or this might work:

select word
from (select word
      from words
      where hash(word) = hash(YOURWORD)
     ) t
where t.word = YOURWORD

The point is to "trick" the SQL compiler into using the hash index first and then do the comparison.

A very simple hash function might be the first five letters. So, a word like "spies" would have only one entry. But, a word like "multi" would have many. Your table of words would have two columns, "word" and "hash". You would then have an index on hash . . . and for optimal performance, sort the words table by the hash. With the word list ordered, there is a good chance that all matching words will be on one or two pages, minimizing external I/O.

Unfortunately, SQLite does not have any built-in hash functions. You can build one yourself by doing something like adding up the character values in the string in pairs.

like image 59
Gordon Linoff Avatar answered Dec 07 '25 21:12

Gordon Linoff


I was already answering your previous question. I am not a database expert my I have some ideas how the letter algorithm could work.

  1. You will use the DB only to make your search faster, that is, to reduce the number of results and then you will check the DB results in memory.

  2. For every word in the DB you will also save a hash. That hash will be a number or a string.

  3. If you have a set of letters, e.g {a, b, b, c, x, t, u, v}, you will compute a set of hashes that can be created from the letters and for every possible hash you will ask the database for all the results.

  4. Test the results if they contain only the letters from your set (we already went through this in your previous question).

Some possible hash functions:

  1. Ordered vowels without repeating, e.g. hash(transformers) = "aeo". From the set above you would get possible hashes "", "a", "u", "au", "ua". Note there are few words without vowels, so the "" request shouldn't make problems.

  2. Ordered vowels with repeating, e.g. hash(request) = "ee". From set {w, h, y, a, a, x, e} you would get hashes "", "a", "e", "y", "aa", "ae", "ay", "ea", "ey", "ya", "ye", "aae", "aay". In general - more requests, fewer results.

  3. Ordered letters without repeating, e.g. hash(transfomers) = "aefmnorst". If N is the number of letters and we are ignoring words with length < 2, you will need at most (if no letters are repeating) (N 3) + (N 4) + (N 5) + .... + (N N), well if N is 20, that's several thousands or requests... let's just ignore this idea.

  4. Bitmask for ASCII letters (ignoring any non-ASCII alphabets). Bit at position 0 means the word contains "a", at position 1 contains "b" etc. If we create the same bitmask for our letters, we can select words such as (wordMask & ~lettersMask) == 0

like image 32
Sulthan Avatar answered Dec 07 '25 22:12

Sulthan