How to: Increase Lucene .net Indexing Speed

Question

I am trying to create an lucene of around 2 million records. The indexing time is around 9 hours. Could you please suggest how to increase performance?

Esteban Araya · Accepted Answer

I wrote a terrible post on how to parallelize a Lucene Index. It's truly terribly written, but you'll find it here (there's some sample code you might want to look at).

Anyhow, the main idea is that you chunk up your data into sizable pieces, and then work on each of those pieces on a separate thread. When each of the pieces is done, you merge them all into a single index.

With the approach described above, I'm able to index 4+ million records in approx. 2 hours.

Hope this gives you an idea of where to go from here.

dlamblin · Answer

Apart from the writing side (merge factor) and the computation aspect (parallelizing) this is sometimes due to the simplest of reasons: slow input. Many people build a Lucene index from a database of data. Sometimes you find that a particular query for this data is too complicated and slow to actually return all the (2 million?) records quickly. Try just the query and writing to disk, if it's still in the order of 5-9 hours, you've found a place to optimize (SQL).

How to: Increase Lucene .net Indexing Speed

Tags:

indexing

lucene.net

Gokul

2 Answers

Esteban Araya

dlamblin

Recent Activity

Donate For Us

How to: Increase Lucene .net Indexing Speed

Tags:

indexing

lucene.net

Gokul

2 Answers

Esteban Araya

dlamblin

Related questions

Recent Activity

Donate For Us