Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In-memory search index for application takes up too much memory - any suggestions?

In our desktop application, we have implemented a simple search engine using an inverted index.

Unfortunately, some of our users' datasets can get very large, e.g. taking up ~1GB of memory before the inverted index has been created. The inverted index itself takes up a lot of memory, almost as much as the data being indexed (another 1GB of RAM).

Obviously this creates problems with out of memory errors, as the 32 bit Windows limit of 2GB memory per application is hit, or users with lesser spec computers struggle to cope with the memory demand.

Our inverted index is stored as a:

Dictionary<string, List<ApplicationObject>>

And this is created during the data load when each object is processed such that the applicationObject's key string and description words are stored in the inverted index.

So, my question is: is it possible to store the search index more efficiently space-wise? Perhaps a different structure or strategy needs to be used? Alternatively is it possible to create a kind of CompressedDictionary? As it is storing lots of strings I would expect it to be highly compressible.

like image 726
RickL Avatar asked Oct 21 '08 14:10

RickL


2 Answers

If it's going to be 1GB... put it on disk. Use something like Berkeley DB. It will still be very fast.

Here is a project that provides a .net interface to it:

http://sourceforge.net/projects/libdb-dotnet

like image 84
bobwienholt Avatar answered Nov 07 '22 09:11

bobwienholt


I see a few solutions:

  1. If you have the ApplicationObjects in an array, store just the index - might be smaller.
  2. You could use a bit of C++/CLI to store the dictionary, using UTF-8.
  3. Don't bother storing all the different strings, use a Trie
like image 24
MSalters Avatar answered Nov 07 '22 09:11

MSalters