ElasticSearch - Optimal number of Shards per node

Tags:

elasticsearch

I would appreciate if someone could suggest the optimal number of shards per ES node for optimal performance or provide any recommended way to arrive at the number of shards one should use, given the number of cores and memory foot print.

978

asked Mar 20 '14 20:03

Rajan

2 Answers

I'm late to the party, but I just wanted to point out a couple of things:

The optimal number of shards per index is always 1. However, that provides no possibility of horizontal scale.
The optimal number of shards per node is always 1. However, then you cannot scale horizontally more than your current number of nodes.

The main point is that shards have an inherent cost to both indexing and querying. Each shard is actually a separate Lucene index. When you run a query, Elasticsearch must run that query against each shard, and then compile the individual shard results together to come up with a final result to send back. The benefit to sharding is that the index can be distributed across the nodes in a cluster for higher availability. In other words, it's a trade-off.

Finally, it should be noted that any more than 1 shard per node will introduce I/O considerations. Since each shard must be indexed and queried individually, a node with 2 or more shards would require 2 or more separate I/O operations, which can't be run at the same time. If you have SSDs on your nodes then the actual cost of this can be reduced, since all the I/O happens much quicker. Still, it's something to be aware of.

That, then, begs the question of why would you want to have more than one shard per node? The answer to that is planned scalability. The number of shards in an index is fixed. The only way to add more shards later is to recreate the index and reindex all the data. Depending on the size of your index that may or may not be a big deal. At the time of writing, Stack Overflow's index is 203GB (see: https://stackexchange.com/performance). That's kind of a big deal to recreate all that data, so resharding would be a nightmare. If you have 3 nodes and a total of 6 shards, that means that you can scale out to up to 6 nodes at a later point easily without resharding.

135

answered Sep 30 '22 05:09

Chris Pratt

There are three condition you consider before sharding..

Situation 1) You want to use elasticsearch with failover and high availability. Then you go for sharding. In this case, you need to select number of shards according to number of nodes[ES instance] you want to use in production.

Consider you wanna give 3 nodes in production. Then you need to choose 1 primary shard and 2 replicas for every index. If you choose more shards than you need.

Situation 2) Your current server will hold the current data. But due to dynamic data increase future you may end up with no space on disk or your server cannot handle much data means, then you need to configure more no of shards like 2 or 3 shards (its up to your requirements) for each index. But there shouldn't any replica.

Situation 3) In this situation you the combined situation of situation 1 & 2. then you need to combine both configuration. Consider your data increased dynamically and also you need high availability and failover. Then you configure a index with 2 shards and 1 replica. Then you can share data among nodes and get an optimal performance..!

Note: Then query will be processed in each shard and perform mapreduce on results from all shards and return the result to us. So the map reduce process is expensive process. Minimum shards gives us optimal performance

If you are using only one node in production then, only one primary shards is optimal no of shards for each index.

Hope it helps..!

answered Sep 30 '22 05:09

BlackPOP

Related questions
                            
                                How to move elasticsearch data directory?
                            
                                Is there a smarter way to reindex elasticsearch?
                            
                                Best practices for searchable archive of thousands of documents (pdf and/or xml)
                            
                                Analyzers in elasticsearch
                            
                                Adding mapping to a type from Java - how do I do it?
                            
                                How to Fix Read timed out in Elasticsearch
                            
                                Elastic search- search_analyzer vs index_analyzer
                            
                                Elasticsearch sort by children
                            
                                Find distinct values, not distinct counts in elasticsearch
                            
                                How to check Elasticsearch cluster health?
                            
                                Elasticsearch Filtered query vs Filter [duplicate]
                            
                                Kibana query exact match
                            
                                How to not-analyze in ElasticSearch?
                            
                                How to search nested objects with Elasticsearch
                            
                                ElasticSearch, multi-match with filter?
                            
                                How to access Kibana from Amazon elasticsearch service?
                            
                                How do I do a partial match in Elasticsearch?
                            
                                What is the default user and password for elasticsearch?
                            
                                CURL escape single quote
                            
                                How to log all executed elasticsearch queries

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With