Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr indexing is taking way too long

Tags:

indexing

solr

We need to create our index in Solr and it is taking way too long. We have about 800k records and it seems like it is going to take 15 to 20 days at the rate at which it is indexing. We are looking for a one time index for now. Any suggestions?

like image 674
user991851 Avatar asked Oct 12 '11 16:10

user991851


2 Answers

From my experience indexing big chunks of data might take a while. Index I'm working on have 2m items (size: 10G). Full index takes about 40 hours using DB.

There are some factors that might slowing you down:

  • Memory. One think is having memory on the box, and the other is to allow Solr to use it. Give Solr as much as you can afford for indexing time (you can easily change that later)
  • Garbage collector. With default one we had a lot of problems (after 20-30h indexing was interrupted and we had to start from the beginning)
  • Make Solr cache results from DB
  • Check all queries, how expensive they are
  • Index in smaller batches. If I would index 300k items it would take much longer, than indexing them in 3 batches of 100k
  • Having lots of big full text stored fields is not helping (if you don't need to store something, don't do that)
like image 163
Fuxi Avatar answered Sep 18 '22 17:09

Fuxi


I wrote a system to index about 300,000 records and after some performance tests, I configured SOLR to commit every 5 minutes. Look at the solrconfig.xml. There are several directives related to committing changes but you should not be committing after each record update. Either commit after every 100-200 records or commit every 5 minutes. This is especially important during a reindex of all data.

I chose 5 minutes because it is a reasonable setting for ongoing sync as well, since we poll our db for changes every minute. We tell users that it takes 5 minutes or so for changes to flow through to SOLR, and so far everyone is happy with that.

like image 22
Michael Dillon Avatar answered Sep 19 '22 17:09

Michael Dillon