[Here follows a description of what the app should do under which constrains]
I want a data-structure that searches if a string
exists in a 250,000 word-list, while using only a fair amount of ram and keeping the time it takes to load this data-structure into ram small (let's say 0-8 seconds). The time it takes to find a word should also be quick (let's say 0 to 0.5 second), but ram usage is more important. It should also be possible to create multiple games (more on what this game is about at the title "use") without needing significant more memory.
It would also be highly valuable to know which words start with a string
, but not enough so to sacrifice load-time by many seconds.
It is for an Android offline game. Limited ram is available. The maximum amount of ram an Application can use according to this post is between 16-32mb ram depending on the device. My empty Android Application already uses about 17mb (using Memory Monitor in Android Studio). My android device caps the ram usage off at 26mb, leaving me at about 8mb of free space for my whole Activity
.
They all seem doomed in different ways.
Hashmap - Read all words into a hash-map object.
1.1 initialize speed: slow to read each word into the Hash-map with 23 seconds.
1.2 ram usage: uses significant amount of ram, although I forgot how much exactly.
1.3 search speed: Finding if a word existed in the list was quick of course.
1.4 narrowing down on possible words (optional): slow, needs to go through the whole hash-map and delete them one by one. Also because it's using deletion, multiple games won't be able to be played using the same instance of the hash-map. Too much memory would be taken when adding more games, making narrowing down on the possible words therefor impossible.
Trie - Implement a RadixTree & You can see my implementation here.
2.1 initialize speed: slow to read each word into the RadixTree with 47 seconds.
2.2 ram usage: uses significant amount of ram, so much that Android is suspending threads a couple of times.
2.3 search speed: Finding if a word existed in the list was quick.
2.4 narrowing down on possible words (optional): Ultra fast since only a reference to a node in the tree is needed to then find all possible words as its children. You can play a lot of games with narrowing down the possible words since an extra game requires only a reference to a node in the tree!
Scanner - Go through the word-file sequentially
3.1 initialize speed: none.
3.2 ram usage: none.
3.3 search speed: about 20 seconds.
3.4 narrowing down on possible words (optional): can't be done realistically.
simple code:
String word;
String wordToFind = "example";
boolean foundWord = false;
while (wordFile.hasNextLine()) {
word = wordFile.nextLine();
if(word.equals(wordToFind)) {
foundWord = true;
break;
}
}
test.close();
Long-binary-search-tree: Converting the word-list to a list of long
s then reading these in and doing a binary search on them.
1.1 initialize speed: probably the same as a hash-map or little less with about 20 seconds. However I hope calling Array.sort() does not take too much time, no idea as of yet.
1.2 ram usage: if you only account 12 letter words or lower with a 26 letter alphabet you need 5 bits (2^5= 32) to encode a string. An array of longs would need then 250,000*8 bits = around 2mb. Which is not too much.
1.3 search speed: Arrays.binarySearch()
1.4 narrowing down on possible words (optional): Narrowing down on possible words could be possible but I am not sure how. According to a comment on this post.
Hashmap with storage - Creating a hashfunction that maps a word to an index number of the word-list file. Then accessing the file at this specific location and look from here to find if a word exists. You can make use of the ordering of the alphabet to determine if you can still find the word since the word-list is in natural order.
2.1 initialize speed: not needed (since I need to put every word at the right index beforehand.)
2.2 ram usage: none.
2.3 search speed: fast.
2.4 narrowing down on possible words (optional): not possible.
I have been stuck at this for about a week now. So any new ideas are more than welcome. If any of my assumption above are incorrect I would also be pleased to hear about them.
I made this post this way so others could learn from them as well, either by seeing my mistakes or seeing what does work in the answers.
This sounds like an ideal use for a bloom filter. If you're willing to allow the risk of something being falsely considered a word, you can condense your wordlist into an amount of memory as small or as large as you're willing to make it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With