Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to: Increase Lucene .net Indexing Speed

I am trying to create an lucene of around 2 million records. The indexing time is around 9 hours. Could you please suggest how to increase performance?

like image 533
Gokul Avatar asked Jun 27 '09 03:06

Gokul


2 Answers

I wrote a terrible post on how to parallelize a Lucene Index. It's truly terribly written, but you'll find it here (there's some sample code you might want to look at).

Anyhow, the main idea is that you chunk up your data into sizable pieces, and then work on each of those pieces on a separate thread. When each of the pieces is done, you merge them all into a single index.

With the approach described above, I'm able to index 4+ million records in approx. 2 hours.

Hope this gives you an idea of where to go from here.

like image 149
Esteban Araya Avatar answered Oct 23 '22 15:10

Esteban Araya


Apart from the writing side (merge factor) and the computation aspect (parallelizing) this is sometimes due to the simplest of reasons: slow input. Many people build a Lucene index from a database of data. Sometimes you find that a particular query for this data is too complicated and slow to actually return all the (2 million?) records quickly. Try just the query and writing to disk, if it's still in the order of 5-9 hours, you've found a place to optimize (SQL).

like image 22
dlamblin Avatar answered Oct 23 '22 14:10

dlamblin