We have a big problem with our ES cluster. One of our nodes is always on 99% CPU. For some reason it has about 3 times more threads running for the elasticsearch
process compared to normal node. I have attached 2 htop
screenshots for 2 nodes, one overloaded and another normal. Please advise!
Thank you!
Overloaded Node
Normal Node
UPDATE
Cluster architecture:
11 nodes, 2 dedicated masters, 9 data nodes.
Nodes Hardware Properties
Masters:
Slaves:
Documents in cluster:
~200 Millions
Index conf:
Each index is split in 10 shards (5 primary, 5 replica)
Queries:
Search RT: ~ 250/s
, Index RT: ~ 6K/s
OS
Ubuntu 12.04.4 LTS
JAVA
java version "1.7.0_60" Java(TM) SE Runtime Environment (build 1.7.0_60-b19) Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode)
Figured it out.
[2014-07-07 13:38:42,521][DEBUG][index.search.slowlog.query] [n013.my_cluster] [my_index][3] took[2s], took_millis[2066], types[my_type], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"size":20,"from":0,"sort":{"_score":"desc"},"query":{"filtered":{"query":{"query_string":{"query":"my eight words space separated query","fields":["description","tags"],"default_operator":"OR"}},"filter":{"and":[{"range":{"ats":{"lte":1404730800}}},{"terms":{"aid":[1,2,4]}}]},"_cache":false}}}], extra_source[]
The problem resided inside "filter": {"and": ...}
, looks like these kind of queries are heavier for ES compared to bool
type queries. So whenever you want to apply some filters
, please use bool
filters (must
, must_not
and should
)
Reff: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-filter.html
Cheers!
Based on the sparse info at hand, I have a couple of guesses that could potentially be the problem:
Shards are not well balanced and you are having hot spotting. Ensure that your most heavily used indexes are sharded in such a way that each machine can do its share of work. Also, look into the index level "index.routing.allocation.total_shards_per_node" to try to force an equal balance.
Perhaps on the search side, you are specifying that the search should always go to the "primary" shard. The primary designation isn't something that balances, so basically, the first node up has the primary shard and the others that come up after are all secondaries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With