Probably a very easy (and dumb) question to other elasticsearch devs, what's the difference between these two?
I'm connecting to a remote elasticsearch server from a Java webapp, so far I have been using TransportClient but I was wondering if NodeBuilder can be used to, or NodeBuilder should be used just for embedded clients?
If any of the two can be used to connect to remote ES servers, which one should be better in terms of memory and performance?
If anyone can point me out to a NodeBuilder connecting to a remote ES server example would be great because I haven't had any lucky finding one.
Thanks.
NodeBuilder can also be used to connect to a cluster.
Node node = nodeBuilder().clusterName("yourcluster").client(true).node();
Client client = node.client();
It will join the cluster as another node and will be aware of the whole topology. Using nodes, you can use multicast to discover other running nodes.
My opinion is that I prefer to use TransportClient
than NodeClient
because other cluster nodes won't receive useless information when the TransportClient stops. When a NodeClient stops, each node has to know that even if they don't have to manage it as it does not hold any data.
Also, I have seen in debug mode that NodeClient starts more Threads
than TransportCLient. So I think TransportClient has a lesser memory footprint.
By the way, if you are using Spring, you can use the spring-elasticsearch factories for that. If not, you can always have a look at source code to see how I manage NodeClient vs TransportClient.
Hope this helps.
EDIT 2016-03-09: NodeClient
should not be used. If there is a need for that, people should create a client node (launch an elasticsearch node with node.data: false
and node.master: false
) and use a TransportClient
to connect to it locally.
If I understood the documentation correctly, it's beneficial to use Node Client, at least if you have shards:
The benefit of using the [Node] Client is the fact that operations are automatically routed to the node(s) the operations need to be executed on, without performing a “double hop”. For example, the index operation will automatically be executed on the shard that it will end up existing at.
vs
It [Transport client] does not join the cluster, but simply gets one or more initial transport addresses and communicates with them in round robin fashion on each action (though most actions will probably be “two hop” operations).
As I interpret this, using a node (preferably with client set to true) that joins the cluster and then use the Client on that node, you will send requests directly to the correct node in the cluster.
Using TransportClient, you'll connect to any node, which will then redirect (or possibly forward the request, not sure) the request to the correct node ("two hops")
Using Node Client should be more efficient in terms of network traffic and load on nodes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With