Speeding up Solr Indexing

2 Answers

When you index a document, several steps are performed :

the document is analyzed,
data is put in the RAM buffer,
when the RAM buffer is full, data is flushed to a new segment on disk,
if there are more than ${mergeFactor} segments, segments are merged.

The first two steps will be run in as many threads as you have clients sending data to Solr, so if you want Solr to run three threads for these steps, all you need is to send data to Solr from three threads.

You can configure the number of threads to use for the fourth step if you use a ConcurrentMergeScheduler (http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/index/ConcurrentMergeScheduler.html). However, there is no mean to configure the maximum number of threads to use from Solr configuration files, so what you need is to write a custom class which call setMaxThreadCount in the constructor.

My experience is that the main ways to improve indexing speed with Solr are :

buying faster hardware (especially I/O),
sending data to Solr from several threads (as many threads as cores is a good start),
using the Javabin format,
using faster analyzers.

Although StreamingUpdateSolrServer looks interesting for improving indexing performance, it doesn't support the Javabin format. Since Javabin parsing is much faster than XML parsing, I got better performance by sending bulk updates (800 in my case, but with rather small documents) using CommonsHttpSolrServer and the Javabin format.

You can read http://wiki.apache.org/lucene-java/ImproveIndexingSpeed for further information.

131

answered Sep 28 '22 10:09

jpountz

This article describes an approach to scaling indexing with SolrCloud, Hadoop and Behemoth. This is for Solr 4.0 which hadn't been released at the time this question was originally posted.

answered Sep 28 '22 09:09

ted.strauss

Related questions
                            
                                Exact Phrase search using Lucene?
                            
                                Full Text Search like Google
                            
                                Zend_Search_Lucene vs SOLR
                            
                                Combining Numeric Range Query with Term Query in Lucene
                            
                                Searching on date ranges with Lucene in Java?
                            
                                Directory lock error with Lucene.Net usage in an ASP.NET MVC site
                            
                                Why use Elasticsearch or Apache Solr along with Hibernate Search?
                            
                                Adding tokens to a lucene tokenstream
                            
                                how to make lucene be case-insensitive
                            
                                Neo4j: Cypher query on property array
                            
                                How to create nested boolean query with lucene API (a AND (b OR c))?
                            
                                Lucene 6.0! How to instantiate a BooleanQuery and add other search queries in it?
                            
                                Lucene OR search using Boolean query
                            
                                String length function query in Solr
                            
                                How to disable default scoring/boosting in Hibernate Search/Lucene?
                            
                                Lucene-like searching through JSON objects in JavaScript
                            
                                How to repair corrupted lucene index?
                            
                                How to group results in elasticsearch?
                            
                                how to migrate mysql data to ElasticSearch realtime
                            
                                Search with various combinations of space, hyphen, casing and punctuations

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Speeding up Solr Indexing

Tags:

solr

lucene

phanips

People also ask

2 Answers

jpountz

ted.strauss

Recent Activity

Donate For Us