Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need to know pros and cons of using RAMDirectory

I need to improve performance of my Lucene search query. Can I use RAMDirectory?Does it optimize performance?Is there any index size limit for this? I would appreciate if someone could list pros and cons of using a RAMDirectory.

Thanks.

like image 893
user43498 Avatar asked Oct 17 '09 14:10

user43498


3 Answers

I compare FSDirectory and RAMDirectory.

  • index size is 1.4G
  • Centos, 5G memory

Search 1000 keywords, the average/min/max response time (ms) is here

  • FSDirectory
    • first run: 351/7/2611
    • second run: 47/7/837
    • third run(restart app): 53/7/2343
  • RAMDirectory
    • first run: 38/7/1133
    • second run: 34/7/189
    • third run(restart app): 38/7/959

So, you can see RAMDirectory is do faster then FSDirectory, but after 'os file cache warm up', the speed gap is not so distinct. What's the disadvantage of RMADirectory? In my test

  • It eats much more memory, 1.4G file need about 2G to load it into memory. while FSDirectory uses only 700m. Then it means longer time for full gc.
  • It need more time to load, especially when the index file is large. It need copy the data from file to memory when opening the index. That means requests would be blocked for more time when restart app.
  • It's not so practical to maintain two index in the same time. Because our app switches index every several hours. We want new index is warming up while old index is still working in the same tomcat.
like image 174
leef Avatar answered Nov 14 '22 13:11

leef


A RAMDirectory is faster, but doesn't get written to the disk. It only exists as long as your program is running, and has to be created from scratch every time your program runs.

If your index is small enough to fit comfortably into RAM, and you don't update it frequently, you can maintain an index on the disk and then create a RAMDirectory from it using the RAMDirectory(Directory dir) constructor. Querying that should then be faster than querying the one on disk, once you've paid the penalty of loading it up. But do measure the difference - if the index can fit into memory as a RAMDirectory, then it can fit in the disk cache as well, so you might not see much difference.

like image 39
RichieHindle Avatar answered Nov 14 '22 14:11

RichieHindle


You should profile the use of RAMDirectory. At least in Linux, using RAMDirectory is not any faster than using the default FSDirectory, due to the way the OS buffers I/O.

like image 5
bajafresh4life Avatar answered Nov 14 '22 12:11

bajafresh4life