Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch too many running threads

We have a big problem with our ES cluster. One of our nodes is always on 99% CPU. For some reason it has about 3 times more threads running for the elasticsearch process compared to normal node. I have attached 2 htop screenshots for 2 nodes, one overloaded and another normal. Please advise!

Thank you!

Overloaded Node overloaded node

Normal Node normal node

UPDATE

  1. Cluster architecture:

    11 nodes, 2 dedicated masters, 9 data nodes.

  2. Nodes Hardware Properties

    Masters:

    • CPU: 8x Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz
    • Memory: 32GB
    • Disk: 120GB

    Slaves:

    1. CPU: 12x Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
    2. Memory: 64GB
    3. Disk: 2.7T
  3. Documents in cluster:

    ~200 Millions

  4. Index conf:

    Each index is split in 10 shards (5 primary, 5 replica)

  5. Queries:

    Search RT: ~ 250/s, Index RT: ~ 6K/s

  6. OS

    Ubuntu 12.04.4 LTS

  7. JAVA

java version "1.7.0_60"
Java(TM) SE Runtime Environment (build 1.7.0_60-b19)
Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode)
like image 808
Andrei Stalbe Avatar asked Jul 04 '14 12:07

Andrei Stalbe


2 Answers

Figured it out.

[2014-07-07 13:38:42,521][DEBUG][index.search.slowlog.query] [n013.my_cluster] [my_index][3] took[2s], took_millis[2066], types[my_type], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"size":20,"from":0,"sort":{"_score":"desc"},"query":{"filtered":{"query":{"query_string":{"query":"my eight words space separated query","fields":["description","tags"],"default_operator":"OR"}},"filter":{"and":[{"range":{"ats":{"lte":1404730800}}},{"terms":{"aid":[1,2,4]}}]},"_cache":false}}}], extra_source[]

The problem resided inside "filter": {"and": ...}, looks like these kind of queries are heavier for ES compared to bool type queries. So whenever you want to apply some filters, please use bool filters (must, must_not and should)

Reff: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-filter.html

Cheers!

like image 108
Andrei Stalbe Avatar answered Sep 20 '22 00:09

Andrei Stalbe


Based on the sparse info at hand, I have a couple of guesses that could potentially be the problem:

  • Shards are not well balanced and you are having hot spotting. Ensure that your most heavily used indexes are sharded in such a way that each machine can do its share of work. Also, look into the index level "index.routing.allocation.total_shards_per_node" to try to force an equal balance.

  • Perhaps on the search side, you are specifying that the search should always go to the "primary" shard. The primary designation isn't something that balances, so basically, the first node up has the primary shard and the others that come up after are all secondaries.

like image 25
ppearcy Avatar answered Sep 22 '22 00:09

ppearcy