Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch: Jest vs Rest vs TransportClient vs NodeClient

I have gone through the official documentation at https://www.elastic.co/blog/found-interfacing-elasticsearch-picking-client

But it does not give any benchmarks or performance numbers to help choose among the clients. And I am finding it non-trivial to setup a TransportClient or setup a NodeClient because the documentation for that is also really sparse with little to no examples whatsoever.

So if someone has already done some benchmarking on choosing a client, I would really appreciate that and focus more on tuning an established client rather than evaluating what client to choose.

Our application is a write-heavy application and we plan to have a 50-shard, 50-replica ES cluster for that.

like image 689
user2250246 Avatar asked Mar 08 '16 19:03

user2250246


1 Answers

All those clients are fine for querying and they all have their pros and cons (below list is not exhaustive):

  • A Node client provides a single hop into the cluster but since it will also be part of the cluster it can also induce too much chatter within the cluster
  • A Transport client is not part of the cluster, hence requires a two-hop roundtrip, and communicates with a single node at a time in a round-robin fashion (from the list provided during its construction)
  • Jest is basically the missing client for the ES REST interface
  • If you feel like you don't need all what Jest has to offer and simply want to interact with a few endpoints, you might as well create your own REST client by using Spring REST template, Apache HTTP, etc

If you're going to have a write-heavy application I suggest you don't even use any of those clients at all. The main reason is that they are all synchronous in nature and if any component of your architecture or the network were to fail for some reason, then you'd lose data, and that might not be an option for you.

If you have plenty of data to ingest, you normally go the asynchronous way, i.e. storing your data in a temporary (yet durable) queue (Kafka, Redis, JMS, etc) and then let another process stream it to ES. There are many ways to do that, but a very simple one is to use Logstash for that.

Whether you decide to store your data in Kafka or JMS or Redis, you can then let Logstash consume your data and stream it to ES, i.e. you let Logstash worry about the heavy write part, which it does very well. That can be achieved very easily with

  • a kafka or redis or stomp input
  • a few filters to massage your data
  • an elasticsearch output to forward the resulting data to ES via the bulk endpoint.

With that kind of well-tuned setup, you can handle very heavy write loads without needing to worry about which client you want to use and how you need to tune it. The question is still open for querying, though, but since the write part is paramount in your case, you need to make it solid, the only serious way is by going asynchronous and let a well-developed and tested ETL (such as Logstash, or fluentd, etc) do it for you.

UPDATE

It is worth noting that as of ES 5.0, there will be a new Java REST client available.

like image 145
Val Avatar answered Oct 02 '22 07:10

Val