HashSet of Strings taking up too much memory, suggestions...?

Question

I am currently storing a list of words (around 120,000) in a HashSet, for the purpose of using as a list to check enetered words against to see if they are spelt correctly, and just returning yes or no.

I was wondering if there is a way to do this which takes up less memory. Currently 120,000 words is around 12meg, the actual file the words are read from is around 900kb.

Any suggestions?

Thanks in advance

sandris · Accepted Answer

You could use a prefix tree or trie: http://en.wikipedia.org/wiki/Trie

Aravind Yarram · Answer

Check out bloom filters or cuckoo hashing. Bloom filter or cuckoo hashing?

I am not sure if this is the answer for your question but worth looking into these alternatives. bloom filters are mainly used for spell checker kind of use cases.

Aaron Digulla · Answer

HashSet is probably not the right structure for this. Use Trie instead.

JohnPaul Adamovsky · Answer

This might be a bit late but using Google you can easily find the DAWG investigation and C code that I posted a while ago.

http://www.pathcom.com/~vadco/dawg.html

TWL06 - 178,691 words - fits into 494,676 Bytes

The downside of a compressed-shared-node structure is that it does not work as a hash function for the words in your list. That is to say, it will tell you if a word exists, but it will not return an index to related data for a word that does exist.

If you want the perfect and complete hash functionality, in a processor-cache sized structure, you are going to have to read, understand, and modify a data structure called the ADTDAWG. It will be slightly larger than a traditional DAWG, but it is faster and more useful.

http://www.pathcom.com/~vadco/adtdawg.html

All the very best,

JohnPaul Adamovsky

HashSet of Strings taking up too much memory, suggestions...?

Tags:

java

memory

Dori

4 Answers

sandris

Aravind Yarram

Aaron Digulla

JohnPaul Adamovsky

Recent Activity

Donate For Us

HashSet of Strings taking up too much memory, suggestions...?

Tags:

java

memory

Dori

4 Answers

sandris

Aravind Yarram

Aaron Digulla

JohnPaul Adamovsky

Related questions

Recent Activity

Donate For Us