I'm trying to improve query performance. It takes an average of about 3 seconds for simple queries which don't even touch a nested document, and it's sometimes longer.
curl "http://searchbox:9200/global/user/_search?n=0&sort=influence:asc&q=user.name:Bill%20Smith"
Even without the sort it takes seconds. Here are the details of the cluster:
1.4TB index size.
210m documents that aren't nested (About 10kb each)
500m documents in total. (nested documents are small: 2-5 fields).
About 128 segments per node.
3 nodes, m2.4xlarge (-Xmx set to 40g, machine memory is 60g)
3 shards.
Index is on amazon EBS volumes.
Replication 0 (have tried replication 2 with only little improvement)
I don't see any noticeable spikes in CPU/memory etc. Any ideas how this could be improved?
Garry's points about heap space are true, but it's probably not heap space that's the issue here.
With your current configuration, you'll have less than 60GB of page cache available, for a 1.5 TB index. With less than 4.2% of your index in page cache, there's a high probability you'll be needing to hit disk for most of your searches.
You probably want to add more memory to your cluster, and you'll want to think carefully about the number of shards as well. Just sticking to the default can cause skewed distribution. If you had five shards in this case, you'd have two machines with 40% of the data each, and a third with just 20%. In either case, you'll always be waiting for the slowest machine or disk when doing distributed searches. This article on Elasticsearch in Production goes a bit more in depth on determining the right amount of memory.
For this exact search example, you can probably use filters, though. You're sorting, thus ignoring the score calculated by the query. With a filter, it'll be cached after the first run, and subsequent searches will be quick.
Ok, a few things here:
One thing to think about though, sharding might increase search performance but it also has a massive effect on index time. The more shards the longer it takes to index a document...
You also have quite a bit of data stored, maybe you should look at Custom Routing too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With