How to setup ElasticSearch cluster with auto-scaling on Amazon EC2?

Tags:

elasticsearch

There is a great tutorial elasticsearch on ec2 about configuring ES on Amazon EC2. I studied it and applied all recommendations.

Now I have AMI and can run any number of nodes in the cluster from this AMI. Auto-discovery is configured and the nodes join the cluster as they really should.

The question is How to configure cluster in way that I can automatically launch/terminate nodes depending on cluster load?

For example I want to have only 1 node running when we don't have any load and 12 nodes running on peak load. But wait, if I terminate 11 nodes in cluster what would happen with shards and replicas? How to make sure I don't lose any data in cluster if I terminate 11 nodes out of 12 nodes?

I might want to configure S3 Gateway for this. But all the gateways except for local are deprecated.

There is an article in the manual about shards allocation. May be I'm missing something very basic but I should admit I failed to figure out if it is possible to configure one node to always hold all the shards copies. My goal is to make sure that if this would be the only node running in the cluster we still don't lose any data.

The only solution I can imagine now is to configure index to have 12 shards and 12 replicas. Then when up to 12 nodes are launched every node would have copy of every shard. But I don't like this solution cause I would have to reconfigure cluster if I might want to have more then 12 nodes on peak load.

967

asked Aug 02 '13 07:08

Victor Smirnov

2 Answers

Auto scaling doesn't make a lot of sense with ElasticSearch.

Shard moving and re-allocation is not a light process, especially if you have a lot of data. It stresses IO and network, and can degrade the performance of ElasticSearch badly. (If you want to limit the effect you should throttle cluster recovery using settings like cluster.routing.allocation.cluster_concurrent_rebalance, indices.recovery.concurrent_streams, indices.recovery.max_size_per_sec . This will limit the impact but will also slow the re-balancing and recovery).

Also, if you care about your data you don't want to have only 1 node ever. You need your data to be replicated, so you will need at least 2 nodes (or more if you feel safer with a higher replication level).

Another thing to remember is that while you can change the number of replicas, you can't change the number of shards. This is configured when you create your index and cannot be changed (if you want more shards you need to create another index and reindex all your data). So your number of shards should take into account the data size and the cluster size, considering the higher number of nodes you want but also your minimal setup (can fewer nodes hold all the shards and serve the estimated traffic?).

So theoretically, if you want to have 2 nodes at low time and 12 nodes on peak, you can set your index to have 6 shards with 1 replica. So on low times you have 2 nodes that hold 6 shards each, and on peak you have 12 nodes that hold 1 shard each.

But again, I strongly suggest rethinking this and testing the impact of shard moving on your cluster performance.

162

answered Sep 26 '22 07:09

Rotem Hermon

In cases where the elasticity of your application is driven by a variable query load you could setup ES nodes configured to not store any data (node.data = false, http.enabled = true) and then put them in for auto scaling. These nodes could offload all the HTTP and result conflation processing from your main data nodes (freeing them up for more indexing and searching).

Since these nodes wouldn't have shards allocated to them bringing them up and down dynamically shouldn't be a problem and the auto-discovery should allow them to join the cluster.

answered Sep 27 '22 07:09

edparris

Related questions
                            
                                How to send elasticsearch multi search request in Postman?
                            
                                How do I update an existing document inside ElasticSearch index using NEST?
                            
                                Perform Elasticsearch aggregation without returning hits array
                            
                                ElasticSearch gives error about queue size
                            
                                How to erase ElasticSearch index?
                            
                                Random document in ElasticSearch
                            
                                How to index a .PDF file in ElasticSearch
                            
                                Python pip package RequestsDependencyWarning when installing elastic-search-curator
                            
                                How to deal with Elasticsearch index delay
                            
                                Error: index_not_found_exception
                            
                                Is there a way to exclude a field in an Elasticsearch query
                            
                                How to Do a Mapping of Array of Strings in Elasticsearch
                            
                                what is the difference between _source and _all in Elasticsearch
                            
                                Elasticsearch Dynamic Scripting Disabled
                            
                                Update max_map_count for ElasticSearch docker container Mac host
                            
                                TransportError(403, u'cluster_block_exception', u'blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];')
                            
                                Elasticsearch - How to normalize score when combining regular query and function_score?
                            
                                ElasticSearch get offsets of highlighted snippets
                            
                                How to get auto increment id for elasticsearch
                            
                                Remove duplicate documents from a search in Elasticsearch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With