Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr slow while Indexing

I have more then 100 CSV files which have 10000 row which I am indexing. And then querying for spelling is similar spelling. While doing this Indexing is very slow.

I have found some good solutions

  1. Master-slave where master Index's and slave is used for query. How to index records in Solr faster (and not impact ColdFusion web server)? Two JVM?

  2. Using Tri-Range http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/

I know these two solutions are different I wanted some comments which should be higher priority ? Does the second solution fit to my problem ? And if there exits more solutions to my spell check problem.

Thanks in advance

like image 910
Arjit Avatar asked Apr 18 '12 11:04

Arjit


2 Answers

Indexing will usually make queries slow. If you have fast disks, indexing will use 100% of CPU, otherwise, it will use 100% of disk bandwidth. Either way, queries will be slow.

A master/slave configuration is the standard solution for this. The slave servers are dedicated to search queries. The only time they slow down is after a replication, when new Searchers with new caches are created.

A master/slave configuration may not make indexing much faster, but it will avoid slow query performance. There has been work on making indexing multithreaded, so you may want to test multiple indexing tasks at once. This won't help if the bottleneck is disk IO, only if it is using 100% of one CPU.

Trie fields are great for range queries. I doubt they will have much affect on indexing speed.

Finally, you may want to tune your spelling suggestion options. Spelling suggestion can be a lot of work, and you may be able to get good results with different, less expensive, options.

like image 188
Walter Underwood Avatar answered Oct 06 '22 04:10

Walter Underwood


You can often achieve good query performance while doing bulk indexing without resorting to a Blue/Green setup.

Here are some pointers to making this happen:

  • If you are inserting large batches of documents, use https://github.com/lucidworks/spark-solr if possible. This has a robust bulk-import mechanism that is relatively easy to use. Do not not write your own bulk import to Solr code if you do not have to.
  • If you must use solrj, make sure you are submitting in batches. See the add(Collection<SolrInputDocument) method. If you overwhelm solr with insert http requests, it can slow query speeds drastically.
  • Do not commit too often. Commits are expensive!
  • Use auto-warming. This is a solr feature that causes the file system caches to be warmed every time a new searcher is created (after commits). This is critical in making sure that you get nice fast queries after commits by keeping all searches with valid caches.
  • If you have a lot of shards (like, over 20) consider doing a "Staggered Commit" approach where you commit one-shard-at-a-time. Here is a Python example of this with 16 solr servers that have 64 total shards. By committing to only 1 shard at a time, it keeps the entire collection from committing at the same time thus lessening the impact of commits.
while :
do
  curl -v 'http://solr-0.solr:8983/solr/MyCollection_shard8_0_0_replica_n127/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-0.solr:8983/solr/MyCollection_shard8_0_1_replica_n128/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-0.solr:8983/solr/MyCollection_shard8_1_0_replica_n131/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-0.solr:8983/solr/MyCollection_shard8_1_1_replica_n132/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  
  curl -v 'http://solr-1.solr:8983/solr/MyCollection_shard4_0_0_replica_n95/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-1.solr:8983/solr/MyCollection_shard4_0_1_replica_n96/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-1.solr:8983/solr/MyCollection_shard4_1_0_replica_n99/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-1.solr:8983/solr/MyCollection_shard4_1_1_replica_n100/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-2.solr:8983/solr/MyCollection_shard3_0_0_replica_n87/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-2.solr:8983/solr/MyCollection_shard3_0_1_replica_n88/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-2.solr:8983/solr/MyCollection_shard3_1_0_replica_n91/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-2.solr:8983/solr/MyCollection_shard3_1_1_replica_n92/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-3.solr:8983/solr/MyCollection_shard7_0_0_replica_n119/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-3.solr:8983/solr/MyCollection_shard7_0_1_replica_n120/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-3.solr:8983/solr/MyCollection_shard7_1_0_replica_n123/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-3.solr:8983/solr/MyCollection_shard7_1_1_replica_n124/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-4.solr:8983/solr/MyCollection_shard2_0_0_replica_n79/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-4.solr:8983/solr/MyCollection_shard2_0_1_replica_n80/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-4.solr:8983/solr/MyCollection_shard2_1_0_replica_n83/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-4.solr:8983/solr/MyCollection_shard2_1_1_replica_n84/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-5.solr:8983/solr/MyCollection_shard1_0_0_replica_n71/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-5.solr:8983/solr/MyCollection_shard1_0_1_replica_n72/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-5.solr:8983/solr/MyCollection_shard1_1_0_replica_n75/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-5.solr:8983/solr/MyCollection_shard1_1_1_replica_n76/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-6.solr:8983/solr/MyCollection_shard5_0_0_replica_n159/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-6.solr:8983/solr/MyCollection_shard5_0_1_replica_n161/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-6.solr:8983/solr/MyCollection_shard5_1_0_replica_n163/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-6.solr:8983/solr/MyCollection_shard5_1_1_replica_n165/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-7.solr:8983/solr/MyCollection_shard6_0_0_replica_n151/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-7.solr:8983/solr/MyCollection_shard6_0_1_replica_n153/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-7.solr:8983/solr/MyCollection_shard6_1_0_replica_n155/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-7.solr:8983/solr/MyCollection_shard6_1_1_replica_n157/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-8.solr:8983/solr/MyCollection_shard8_0_0_replica_n135/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-8.solr:8983/solr/MyCollection_shard8_0_1_replica_n137/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-8.solr:8983/solr/MyCollection_shard8_1_0_replica_n139/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-8.solr:8983/solr/MyCollection_shard8_1_1_replica_n141/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-9.solr:8983/solr/MyCollection_shard4_0_0_replica_n143/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-9.solr:8983/solr/MyCollection_shard4_0_1_replica_n145/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-9.solr:8983/solr/MyCollection_shard4_1_0_replica_n147/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-9.solr:8983/solr/MyCollection_shard4_1_1_replica_n149/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-10.solr:8983/solr/MyCollection_shard6_0_0_replica_n111/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-10.solr:8983/solr/MyCollection_shard6_0_1_replica_n112/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-10.solr:8983/solr/MyCollection_shard6_1_0_replica_n115/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-10.solr:8983/solr/MyCollection_shard6_1_1_replica_n116/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-11.solr:8983/solr/MyCollection_shard5_0_0_replica_n103/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-11.solr:8983/solr/MyCollection_shard5_0_1_replica_n104/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-11.solr:8983/solr/MyCollection_shard5_1_0_replica_n107/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-11.solr:8983/solr/MyCollection_shard5_1_1_replica_n108/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-12.solr:8983/solr/MyCollection_shard2_0_0_replica_n167/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-12.solr:8983/solr/MyCollection_shard2_0_1_replica_n169/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-12.solr:8983/solr/MyCollection_shard2_1_0_replica_n171/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-12.solr:8983/solr/MyCollection_shard2_1_1_replica_n173/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-13.solr:8983/solr/MyCollection_shard1_0_0_replica_n175/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-13.solr:8983/solr/MyCollection_shard1_0_1_replica_n177/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-13.solr:8983/solr/MyCollection_shard1_1_0_replica_n179/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-13.solr:8983/solr/MyCollection_shard1_1_1_replica_n181/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-14.solr:8983/solr/MyCollection_shard3_0_0_replica_n183/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-14.solr:8983/solr/MyCollection_shard3_0_1_replica_n185/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-14.solr:8983/solr/MyCollection_shard3_1_0_replica_n187/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-14.solr:8983/solr/MyCollection_shard3_1_1_replica_n189/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-15.solr:8983/solr/MyCollection_shard7_0_0_replica_n191/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-15.solr:8983/solr/MyCollection_shard7_0_1_replica_n193/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-15.solr:8983/solr/MyCollection_shard7_1_0_replica_n195/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-15.solr:8983/solr/MyCollection_shard7_1_1_replica_n197/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  # 15 minute commit timer
  sleep 900
done
  • Make sure your filter cache is big enough. This keeps filtering from eating up CPU and Disk IO thus you'll notice much less impact while indexing.
  • Consider doing a Solr Optimize periodically. By doing a solr optimize, more of the index is likely to fit in memory causing less disk IO. And by having less disk IO, new documents are less likely to need to hit disk thus faster disk performance.
  • If Solr is on shared hardware, make sure that the other applications running are not eating up CPU and disk resources. For example, if you are parsing documents with Apache Tika, make sure Tika is running on another host. Solr is best left by itself.
  • Make sure you have enough memory left over on the VM for file system cache. For example. if you have a 32G RAM machine, you should consider do not have your max heap size too large. For example if you have -Xmx28G it doesn't leave enough memory for the operating system's file system cache. Use empirical analysis to determine how much heap you actually need. For example -Xmx12G would leave 50% of the memory to FS cache. The more you can use your caches during queries, the less your indexing will affect you.
like image 41
Nicholas DiPiazza Avatar answered Oct 06 '22 04:10

Nicholas DiPiazza