Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory efficient way to store strings

Assume I have millions strings. Each string has an int value. I want to retrieve this value by input string but I don't want to store all this strings because they take a lot of space. I can't use hash table because of it need to store all or at least many strings in memory. So what is good data structure for my case (I don't need to add or delete any strings, I am already have prepared data and read is only allowed operation)

like image 646
Neir0 Avatar asked Mar 29 '13 15:03

Neir0


People also ask

What is the storage of string?

Strings are stored on the heap area in a separate memory location known as String Constant pool. String constant pool: It is a separate block of memory where all the String variables are held. String str1 = "Hello"; directly, then JVM creates a String object with the given value in a String constant pool.


2 Answers

Use a trie to prevent storing common substrings..

like image 55
Fred Foo Avatar answered Nov 15 '22 09:11

Fred Foo


If you can can pre-process the word list take a look at perfect hashes, like CMPH. ( gperf is another, but seems optimized for smaller data sets. )

From the CMPH docs:

A perfect hash function maps a static set of n keys into a set of m integer numbers without collisions, where m is greater than or equal to n. If m is equal to n, the function is called minimal.

...

The CMPH Library encapsulates the newest and more efficient algorithms in an easy-to-use, production-quality, fast API. The library was designed to work with big entries that cannot fit in the main memory. It has been used successfully for constructing minimal perfect hash functions for sets with more than 100 million of keys, ...

like image 25
Paul Rubel Avatar answered Nov 15 '22 09:11

Paul Rubel