Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Index linear growth - Performance degradation

We have 4 shards with 14GB index on each of them Each shard has a master and 3 slaves (each of them with 32GB RAM)

We're expecting that the index size will grow to double or triple in near future. So we thought of merging our indexes to 28GB index so that each shard has 28GB index and also increased our RAM on each slave to 48GB.

We made this changes locally and tested the server by sending same 10K realistic queries to each server with 14GB & 28GB index, we found that

  1. For server with 14GB index (48GB RAM): search time was 480ms, number of index hits: 3.8G

  2. For server with 28GB index (48GB RAM): search time was 900ms, number of index hits: 7.2G

So we saw that having the whole index in RAM doesn't help in sustaining the performance in terms of search time. Search time increased linearly to double when the index size was doubled.

We were thinking of keeping only 4 shards configuration but it looks like now we have to add another shard or another slave to each shard.

Is there any other way that we can configure our servers so that the performance isn't affected even when index size doubles or triples?

like image 567
feroz.kh Avatar asked Feb 20 '23 12:02

feroz.kh


1 Answers

I'd hate to say it depends, but it... depends.

The total size of your index on each is 14GB, which basically doesn't mean much of anything to SOLR. To get a real feel for performance what is the uniqueness of the terms indexed? An index of 14GB worth of data with the single word "cat" in it over and over again will be really quick.

Also have you confirmed you need the following features, disabling them can boost performance large amounts:

Schema

Stored Fields

Do you need stored fields? Removing this can greatly increase performance (you can safely have an entire index without any stored fields and rely completely on facets, pivots, and other features in solr to drive a UX).

omitNorms

You can, in some instances, set this flag to false to reduce memory in general and increase performance.

omitTermFreqAndPositions

Can be turned off, reduced memory in general and increase in performance.

System

Optimize Core/Index (Segment Count)

Index optimization is important when dealing with larger index sizes. Ensure each core is optimized and that when you look at the core it says the segment count is = 1. What I found is that this play a more important role as you increase the index size (this plays into OS level file caching and the fact it's easier to read one large file, rather than multiple small files) And yes, that does say 171 million+ documents.

Term Index Interval/Frequency

Configuration of term index interval may be required (by default 256) if you have a field or multiple fields that contain very unique values (for example GUID/UUIDs or unique IDs in general). Typically, the lower the TIF the more memory you need, the higher the TIF the less memory you need but the more disk seeks you may have.

Allocation of too much Ram

Solr works best with a good split between OS level disk cache and RAM used when faceting, you'd be surprised that you could actually get better performance by tweaking other parameters which lower required ram usage and free up resources for disk.

like image 79
terrance.a.snyder Avatar answered Feb 27 '23 23:02

terrance.a.snyder