Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I configure my Solr filterCache, firstSearcher and newSearcher?

Question 1: I'm trying to optimize my searchers in my solrconfig.xml, and there are two different searchers that can get warmed. My understanding is that firstSearcher only fires on server startup. A newSearcher is created whenever you need a new searcher. It seems to me that we would want the same fqs and facets to be specified in each. When is a case when you would want them to differ?

Question 2: Is there any way I can determine the effect on searcher startup time of adding a fq or facet? I know that I can brute force measure the startup times of a searcher with fqs/facets vs. one without, but that's not very granular. Assuming there's cost/benefit to way for an individual fq/facet, I'd like to be able to measure that so I can decide which things are worth warming and which are not.

Question 3: How can I effectively size my filterCache? I have a specific set of fqs that I know are likely to be hit, about 500 of them, so it seems like I would set it to 500. However, Solr seems to use the filterCache for query results that have to be faceted. Since 90% of my queries are faceted, it seems like we'd have to use the number of queries expected as the basis of the cache size. Does that sound right?

like image 292
Andy Lester Avatar asked Feb 06 '13 20:02

Andy Lester


1 Answers

  1. Your understanding is correct. However, a newSearcher can be autowarmed from the last one, so that's one difference. Another is that since newSearcher happens per commit, if you're doing frequent commits, you may want to do considerably less work than if you're starting cold.

  2. I'm not aware of a great way. Queries are run serially, and at least with firstSearcher, show up in the access log, so you can literally see how long they take. Whether a given query set results in something "warm enough" is pretty much trial and error though.

  3. The biggest thing to remember about FilterCache size is that a single entry is around (Number of Documents in your index)/8 bytes. So if you set the size to 500, and you have 100M docs in your index, you'll need 6.25G of heap just to hold it. Generally the recommendation is that you size your heap as small as possible to leave more memory for the disk cache, but this is an exception. As far as facet queries putting eviction pressure on your cache goes, I have the same problem, and I'm not aware of any solution. See https://issues.apache.org/jira/browse/SOLR-8171.

like image 158
randomstatistic Avatar answered Oct 21 '22 07:10

randomstatistic