Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HashSet of Strings taking up too much memory, suggestions...?

Tags:

java

memory

I am currently storing a list of words (around 120,000) in a HashSet, for the purpose of using as a list to check enetered words against to see if they are spelt correctly, and just returning yes or no.

I was wondering if there is a way to do this which takes up less memory. Currently 120,000 words is around 12meg, the actual file the words are read from is around 900kb.

Any suggestions?

Thanks in advance

like image 787
Dori Avatar asked Jul 12 '10 11:07

Dori


4 Answers

You could use a prefix tree or trie: http://en.wikipedia.org/wiki/Trie

like image 121
sandris Avatar answered Sep 27 '22 20:09

sandris


Check out bloom filters or cuckoo hashing. Bloom filter or cuckoo hashing?

I am not sure if this is the answer for your question but worth looking into these alternatives. bloom filters are mainly used for spell checker kind of use cases.

like image 39
Aravind Yarram Avatar answered Sep 27 '22 18:09

Aravind Yarram


HashSet is probably not the right structure for this. Use Trie instead.

like image 23
Aaron Digulla Avatar answered Sep 27 '22 18:09

Aaron Digulla


This might be a bit late but using Google you can easily find the DAWG investigation and C code that I posted a while ago.

http://www.pathcom.com/~vadco/dawg.html

TWL06 - 178,691 words - fits into 494,676 Bytes

The downside of a compressed-shared-node structure is that it does not work as a hash function for the words in your list. That is to say, it will tell you if a word exists, but it will not return an index to related data for a word that does exist.

If you want the perfect and complete hash functionality, in a processor-cache sized structure, you are going to have to read, understand, and modify a data structure called the ADTDAWG. It will be slightly larger than a traditional DAWG, but it is faster and more useful.

http://www.pathcom.com/~vadco/adtdawg.html

All the very best,

JohnPaul Adamovsky

like image 28
JohnPaul Adamovsky Avatar answered Sep 27 '22 19:09

JohnPaul Adamovsky