Using multiple node clients in elasticsearch

Tags:

I'm trying to think of ways to scale our elasticsearch setup. Do people use multiple node clients on an Elasticsearch cluster and put them in front of a load balancer/reverse proxy like Nginx. Other ideas would be great.

531

asked Dec 10 '14 18:12

Ananth Ravi

1 Answers

So I'd start with re-capping the three different kinds of nodes you can configure in Elasticsearch:

Data Node - node.data set to true and node.master set to false - these are your core nodes of an elasticsearch cluster, where the data is stored.
Dedicated Master Node - node.data is set to false and node.master is set to true - these are responsible for managing the cluster state.
Client Node - node.data is set to false and node.master is set to
false - these respond to client data requests, querying for results
from the data nodes and gathering the data to return to the client.

By splitting the functions into 3 different base node types you have a great degree of granularity and control in managing the scale of your cluster. As each node type handles a more isolated set of responsibilities you are better able to tune each one and to scale appropriately.

For data nodes, it's a function of handling indexing and query responses, along with making certain you have enough storage allocated to each node. You'll want to monitor storage usage and disk thru-put for each node, along with cpu and memory usage. You want to avoid configurations where you run out of disk, or saturate disk thru-put, while still have substantial excess cpu and memory, or the reverse where memory and cpu max but you have lot's of disk available. The best way to determine this is thru some benchmarking of typical indexing and querying activities.

For master nodes, you should always have at least 3 and should always have an odd number. The quorum should be set to N/2 + 1 where is N is the number of master nodes. This way you don't run into split brain issues with your cluster. Dedicated master nodes tend not to be heavily loaded so that can be quite small.

For client nodes you can indeed put them behind a load balancer, or use dns entries to point to them. They are easily scaled up and down by just adding more to the cluster and should be added for both redundancy and as you see cpu and memory usage climb. Not much need for a lot of disk.

No matter what your configuration, in addition to benchmarking likely loads ahead of time I'd strongly advise close monitoring of cpu, memory and disk - ES is easy to start rolling out but it does need watching as you scale into larger numbers of transactions and more nodes. Dealing with a yellow or red status cluster due to node failures from memory or disk exhaustion is not a lot of fun.

I'd take a close read of this article for some background:

http://elastic.co/guide/en/elasticsearch/reference/current/modules-node.html

Plus this series of articles:

http://elastic.co/guide/en/elasticsearch/guide/current/distributed-cluster.html

121

answered Oct 22 '22 16:10

John Petrone

Related questions
                            
                                Logstash -Could not find any executable java binary
                            
                                Checking Elasticsearch Heap Size
                            
                                How to connect Kafka with Elasticsearch?
                            
                                Elasticsearch: SearchPhaseExecutionException/Parse Failure
                            
                                How to add filter to a more like this query in Elastic Search?
                            
                                Unknown key for a START_OBJECT in [bool] in elastic search
                            
                                Elastic Search/Tire: How do I filter a boolean attribute?
                            
                                Static Query Building with NEST
                            
                                Index a dynamic object using NEST
                            
                                How to check if ElasticSearch client is connected?
                            
                                elasticsearch 2 node cluster: proper setup?
                            
                                Elasticsearch: Order of filters for best performance
                            
                                Elastisearch update by query
                            
                                elasticsearch what is the difference between best_field and most_field
                            
                                Can I customize Elastic Search to use my own Stop Word list?
                            
                                Search a nested field for multiple values on the same field with elasticsearch
                            
                                Elasticsearch filter aggregations on minimal doc count
                            
                                How to combine multiple bool queries in elasticsearch
                            
                                Elasticsearch process memory locking failed
                            
                                Symbols in query-string for elasticsearch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using multiple node clients in elasticsearch

Tags:

elasticsearch

scaling

Ananth Ravi

People also ask

1 Answers

John Petrone

Recent Activity

Donate For Us