Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch garbage collection warnings (JvmGcMonitorService)

I am running a set of integration tests via Docker.

The tests create various indexes, index docs, query docs, delete docs, etc.

Most of the time they pass, but occasionally a few tests will fail, due to a SocketTimeout connecting to the Elasticsearch.

When a failure occurs, I see these GC warnings in the logs:

2021-08-16T19:27:04.032753121Z 19:27:04.028 elasticsearch> [19:27:04.025Z #205.021 WARN  -            -   ] org.elasticsearch.monitor.jvm.JvmGcMonitorService: [a65fc811a8a8] [gc][young][359][4] duration [7.5s], collections [1]/[8.3s], total [7.5s]/[7.7s], memory [310.9mb]->[83.2mb]/[989.8mb], all_pools {[young] [269.5mb]->[4.1mb]/[273mb]}{[survivor] [34.1mb]->[34.1mb]/[34.1mb]}{[old] [7.2mb]->[44.9mb]/[682.6mb]}
2021-08-16T19:27:04.033614353Z 19:27:04.028 elasticsearch> [19:27:04.027Z #205.021 WARN  -            -   ] org.elasticsearch.monitor.jvm.JvmGcMonitorService: [a65fc811a8a8] [gc][359] overhead, spent [7.5s] collecting in the last [8.3s]
...
2021-08-16T19:28:46.770105932Z 19:28:46.768 elasticsearch> [19:28:46.767Z #205.021 WARN  -            -   ] org.elasticsearch.monitor.jvm.JvmGcMonitorService: [a65fc811a8a8] [gc][young][459][5] duration [2.2s], collections [1]/[2.7s], total [2.2s]/[9.9s], memory [350.3mb]->[77.9mb]/[989.8mb], all_pools {[young] [271.2mb]->[1.8mb]/[273mb]}{[survivor] [34.1mb]->[13.8mb]/[34.1mb]}{[old] [44.9mb]->[63mb]/[682.6mb]}
2021-08-16T19:28:46.773394585Z 19:28:46.771 elasticsearch> [19:28:46.768Z #205.021 WARN  -            -   ] org.elasticsearch.monitor.jvm.JvmGcMonitorService: [a65fc811a8a8] [gc][459] overhead, spent [2.2s] collecting in the last [2.7s]

I am just wondering if there is anything (other than retrying on failure) I can tweak or tune to resolve these issues.

Some of my tests perform the following a 'delete all' query to delete all docs from all indices:

RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost(
            elasticHost, Integer.parseInt(elasticPort))));
client.indices().delete(new DeleteIndexRequest("_all"), RequestOptions.DEFAULT);

I wonder if this request is causing Elasticsearch to run a long/full garbage collection? Note my test suite is quite small, it maybe creates 10-15 indices and indexes no more than about 100 docs across those indices, i.e I am not creating hundreds or thousands of docs.

Here are some details of the state of Elasticsearch. Note this is after the test suite has been completed, so some of the tests will have performed some cleanup (some indices will have been deleted, etc), so some of this may not be useful. If there is any more information I can provide please let me know. Note these stats are from a successful run of the test suite, I don't have stats from a failed run yet.

_cluster/stats?pretty&human

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "jYnioXskQvi23hDVekWGeA",
  "timestamp" : 1635420632189,
  "status" : "green",
  "indices" : {
    "count" : 16,
    "shards" : {
      "total" : 32,
      "primaries" : 32,
      "replication" : 0.0,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 2,
          "avg" : 2.0
        },
        "primaries" : {
          "min" : 2,
          "max" : 2,
          "avg" : 2.0
        },
        "replication" : {
          "min" : 0.0,
          "max" : 0.0,
          "avg" : 0.0
        }
      }
    },
    "docs" : {
      "count" : 50,
      "deleted" : 8
    },
    "store" : {
      "size" : "436.5kb",
      "size_in_bytes" : 447012,
      "reserved" : "0b",
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size" : "7.4kb",
      "memory_size_in_bytes" : 7592,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "0b",
      "memory_size_in_bytes" : 0,
      "total_count" : 0,
      "hit_count" : 0,
      "miss_count" : 0,
      "cache_size" : 0,
      "cache_count" : 0,
      "evictions" : 0
    },
    "completion" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 40,
      "memory" : "157.4kb",
      "memory_in_bytes" : 161232,
      "terms_memory" : "119.9kb",
      "terms_memory_in_bytes" : 122848,
      "stored_fields_memory" : "19kb",
      "stored_fields_memory_in_bytes" : 19520,
      "term_vectors_memory" : "0b",
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory" : "4.1kb",
      "norms_memory_in_bytes" : 4288,
      "points_memory" : "0b",
      "points_memory_in_bytes" : 0,
      "doc_values_memory" : "14.2kb",
      "doc_values_memory_in_bytes" : 14576,
      "index_writer_memory" : "0b",
      "index_writer_memory_in_bytes" : 0,
      "version_map_memory" : "0b",
      "version_map_memory_in_bytes" : 0,
      "fixed_bit_set" : "0b",
      "fixed_bit_set_memory_in_bytes" : 0,
      "max_unsafe_auto_id_timestamp" : -1,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "boolean",
          "count" : 9,
          "index_count" : 2
        },
        {
          "name" : "date",
          "count" : 22,
          "index_count" : 1
        },
        {
          "name" : "double",
          "count" : 2,
          "index_count" : 1
        },
        {
          "name" : "integer",
          "count" : 10,
          "index_count" : 1
        },
        {
          "name" : "join",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "keyword",
          "count" : 169,
          "index_count" : 9
        },
        {
          "name" : "long",
          "count" : 20,
          "index_count" : 6
        },
        {
          "name" : "object",
          "count" : 23,
          "index_count" : 5
        },
        {
          "name" : "text",
          "count" : 61,
          "index_count" : 6
        }
      ]
    },
    "analysis" : {
      "char_filter_types" : [
        {
          "name" : "html_strip",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "tokenizer_types" : [
        {
          "name" : "path_hierarchy",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "filter_types" : [
        {
          "name" : "length",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "limit",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "pattern_capture",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "analyzer_types" : [
        {
          "name" : "custom",
          "count" : 2,
          "index_count" : 1
        },
        {
          "name" : "pattern",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "standard",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [
        {
          "name" : "icu_tokenizer",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_filters" : [
        {
          "name" : "icu_normalizer",
          "count" : 2,
          "index_count" : 1
        }
      ],
      "built_in_analyzers" : [ ]
    }
  },
  "nodes" : {
    "count" : {
      "total" : 1,
      "coordinating_only" : 0,
      "data" : 1,
      "ingest" : 1,
      "master" : 1,
      "remote_cluster_client" : 1
    },
    "versions" : [
      "7.10.2"
    ],
    "os" : {
      "available_processors" : 12,
      "allocated_processors" : 12,
      "names" : [
        {
          "name" : "Linux",
          "count" : 1
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "openSUSE Leap 15.3",
          "count" : 1
        }
      ],
      "mem" : {
        "total" : "24.9gb",
        "total_in_bytes" : 26751438848,
        "free" : "17.3gb",
        "free_in_bytes" : 18664411136,
        "used" : "7.5gb",
        "used_in_bytes" : 8087027712,
        "free_percent" : 70,
        "used_percent" : 30
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 0
      },
      "open_file_descriptors" : {
        "min" : 275,
        "max" : 275,
        "avg" : 275
      }
    },
    "jvm" : {
      "max_uptime" : "6.6m",
      "max_uptime_in_millis" : 396405,
      "versions" : [
        {
          "version" : "11.0.12",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "11.0.12+7-suse-3.59.1-x8664",
          "vm_vendor" : "Oracle Corporation",
          "bundled_jdk" : true,
          "using_bundled_jdk" : false,
          "count" : 1
        }
      ],
      "mem" : {
        "heap_used" : "292.8mb",
        "heap_used_in_bytes" : 307096056,
        "heap_max" : "989.8mb",
        "heap_max_in_bytes" : 1037959168
      },
      "threads" : 68
    },
    "fs" : {
      "total" : "250.9gb",
      "total_in_bytes" : 269490393088,
      "free" : "243.9gb",
      "free_in_bytes" : 261967937536,
      "available" : "231.1gb",
      "available_in_bytes" : 248207265792
    },
    "plugins" : [
      {
        "name" : "analysis-icu",
        "version" : "7.10.2",
        "elasticsearch_version" : "7.10.2",
        "java_version" : "1.8",
        "description" : "The ICU Analysis plugin integrates the Lucene ICU module into Elasticsearch, adding ICU-related analysis components.",
        "classname" : "org.elasticsearch.plugin.analysis.icu.AnalysisICUPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false
      }
    ],
    "network_types" : {
      "transport_types" : {
        "netty4" : 1
      },
      "http_types" : {
        "netty4" : 1
      }
    },
    "discovery_types" : {
      "single-node" : 1
    },
    "packaging_types" : [
      {
        "flavor" : "oss",
        "type" : "docker",
        "count" : 1
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 0,
      "processor_stats" : { }
    }
  }
}

_nodes/stats?pretty&human

Too big to paste, copied here: https://pastebin.com/EBghcNuQ

Here is an extract from the above pastebin, it does look as though the young generation is almost maxed out, however, even so, should it be taking such a long time (7 seconds from the warning at the top of this post) to collect?

"pools" : {
    "young" : {
      "used" : "219.7mb",
      "used_in_bytes" : 230396648,
      "max" : "273mb",
      "max_in_bytes" : 286326784,
      "peak_used" : "273mb",
      "peak_used_in_bytes" : 286326784,
      "peak_max" : "273mb",
      "peak_max_in_bytes" : 286326784
    },
    "survivor" : {
      "used" : "9.9mb",
      "used_in_bytes" : 10391856,
      "max" : "34.1mb",
      "max_in_bytes" : 35782656,
      "peak_used" : "34.1mb",
      "peak_used_in_bytes" : 35782656,
      "peak_max" : "34.1mb",
      "peak_max_in_bytes" : 35782656
    },
    "old" : {
      "used" : "63.2mb",
      "used_in_bytes" : 66307552,
      "max" : "682.6mb",
      "max_in_bytes" : 715849728,
      "peak_used" : "63.2mb",
      "peak_used_in_bytes" : 66307552,
      "peak_max" : "682.6mb",
      "peak_max_in_bytes" : 715849728
    }
  }
},
"threads" : {
  "count" : 68,
  "peak_count" : 68
},
"gc" : {
  "collectors" : {
    "young" : {
      "collection_count" : 10,
      "collection_time" : "115ms",
      "collection_time_in_millis" : 115
    },
    "old" : {
      "collection_count" : 2,
      "collection_time" : "73ms",
      "collection_time_in_millis" : 73
    }
  }
},

_cluster/settings?pretty

{
  "persistent" : { },
  "transient" : { }
}

_cat/nodes?h=heap*&v

heap.current heap.percent heap.max
     117.5mb           11  989.8mb

_cat/nodes?v

ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.17.0.5           12          30   0    0.07    0.05     0.09 dimr      *      aabedf9ca3ca

_cat/fielddata?v

id                     host       ip         node         field size
_eJ-A7pFSoKMfuDKs63SLg 172.17.0.5 172.17.0.5 aabedf9ca3ca _id    1kb

_cat/shards

workspace-archive-cleanup-job_item-000001 1 p STARTED 1  4.1kb 172.17.0.5 aabedf9ca3ca
workspace-archive-cleanup-job_item-000001 0 p STARTED 5   19kb 172.17.0.5 aabedf9ca3ca
assigndeviceids_item-000001               1 p STARTED 0   208b 172.17.0.5 aabedf9ca3ca
assigndeviceids_item-000001               0 p STARTED 4 29.3kb 172.17.0.5 aabedf9ca3ca

_cat/allocation?v

shards disk.indices disk.used disk.avail disk.total disk.percent host       ip         node
     4       52.7kb    19.7gb    231.2gb    250.9gb            7 172.17.0.5 172.17.0.5 aabedf9ca3ca

JVM options:

elastic+     7  0.5  5.2 7594004 1360460 ?     Sl   06:14   1:00 /usr/lib64/jvm/java-11-openjdk-11/bin/java
-Xshare:auto
-Des.networkaddress.cache.ttl=60
-Des.networkaddress.cache.negative.ttl=10
-XX:+AlwaysPreTouch
-Xss1m
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-XX:-OmitStackTraceInFastThrow
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dio.netty.allocator.numDirectArenas=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Djava.locale.providers=SPI,COMPAT
-Xms1g
-Xmx1g
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-Djava.io.tmpdir=/tmp/elasticsearch-9557497152372782786
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=data
-XX:ErrorFile=logs/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m -Des.cgroups.hierarchy.override=/
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9010
-Dcom.sun.management.jmxremote.rmi.port=9010
-Djava.rmi.server.hostname=127.0.0.1
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-XX:MaxDirectMemorySize=536870912
-Des.path.home=/usr/share/elasticsearch
-Des.path.conf=/usr/share/elasticsearch/config
-Des.distribution.flavor=oss
-Des.distribution.type=docker
-Des.bundled_jdk=true 
-cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -Ediscovery.type=single-node

Extract from log up until first GC warning:

elasticsearch> OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
elasticsearch> [19:20:31.086Z #205.001 INFO  -            -   ] org.elasticsearch.node.Node: [a65fc811a8a8] version[7.10.2], pid[8], build[oss/docker/747e1cc71def077253878a59143c1f785afa92b9/2021-01-13T00:42:12.435326Z], OS[Linux/3.10.0-862.6.3.el7.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/11.0.11/11.0.11+9-suse-lp152.2.12.1-x8664]
elasticsearch> [19:20:31.097Z #205.001 INFO  -            -   ] org.elasticsearch.node.Node: [a65fc811a8a8] JVM home [/usr/lib64/jvm/java-11-openjdk-11], using bundled JDK [false]
elasticsearch> [19:20:31.097Z #205.001 INFO  -            -   ] org.elasticsearch.node.Node: [a65fc811a8a8] JVM arguments [-Xshare:auto, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=SPI,COMPAT, -Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -Djava.io.tmpdir=/tmp/elasticsearch-11638220902719581930, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Des.cgroups.hierarchy.override=/, -Dcom.sun.management.jmxremote, -Dcom.sun.management.jmxremote.port=9010, -Dcom.sun.management.jmxremote.rmi.port=9010, -Djava.rmi.server.hostname=127.0.0.1, -Dcom.sun.management.jmxremote.local.only=false, -Dcom.sun.management.jmxremote.authenticate=false, -Dcom.sun.management.jmxremote.ssl=false, -XX:MaxDirectMemorySize=536870912, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=oss, -Des.distribution.type=docker, -Des.bundled_jdk=true]
elasticsearch> [19:20:43.124Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [aggs-matrix-stats]
elasticsearch> [19:20:43.125Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [analysis-common]
elasticsearch> [19:20:43.125Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [geo]
elasticsearch> [19:20:43.125Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [ingest-common]
elasticsearch> [19:20:43.126Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [ingest-geoip]
elasticsearch> [19:20:43.126Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [ingest-user-agent]
elasticsearch> [19:20:43.126Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [kibana]
elasticsearch> [19:20:43.127Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [lang-expression]
elasticsearch> [19:20:43.127Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [lang-mustache]
elasticsearch> [19:20:43.127Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [lang-painless]
elasticsearch> [19:20:43.127Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [mapper-extras]
elasticsearch> [19:20:43.128Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [parent-join]
elasticsearch> [19:20:43.128Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [percolator]
elasticsearch> [19:20:43.128Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [rank-eval]
elasticsearch> [19:20:43.129Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [reindex]
elasticsearch> [19:20:43.129Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [repository-url]
elasticsearch> [19:20:43.129Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded module [transport-netty4]
elasticsearch> [19:20:43.130Z #205.001 INFO  -            -   ] org.elasticsearch.plugins.PluginsService: [a65fc811a8a8] loaded plugin [analysis-icu]
elasticsearch> [19:20:43.761Z #205.001 INFO  -            -   ] org.elasticsearch.env.NodeEnvironment: [a65fc811a8a8] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [9.4gb], net total_space [9.9gb], types [rootfs]
elasticsearch> [19:20:43.762Z #205.001 INFO  -            -   ] org.elasticsearch.env.NodeEnvironment: [a65fc811a8a8] heap size [989.8mb], compressed ordinary object pointers [true]
elasticsearch> [19:20:43.962Z #205.001 INFO  -            -   ] org.elasticsearch.node.Node: [a65fc811a8a8] node name [a65fc811a8a8], node ID [zinhYoeoQ_q-fGulfFN42g], cluster name [docker-cluster], roles [master, remote_cluster_client, data, ingest]
elasticsearch> [19:20:53.399Z #205.001 INFO  -            -   ] org.elasticsearch.transport.NettyAllocator: [a65fc811a8a8] creating NettyAllocator with the following configs: [name=unpooled, suggested_max_allocation_size=1mb, factors={es.unsafe.use_unpooled_allocator=null, g1gc_enabled=false, g1gc_region_size=0b, heap_size=989.8mb}]
elasticsearch> [19:20:53.580Z #205.001 INFO  -            -   ] org.elasticsearch.discovery.DiscoveryModule: [a65fc811a8a8] using discovery type [single-node] and seed hosts providers [settings]
elasticsearch> [19:20:54.126Z #205.001 WARN  -            -   ] org.elasticsearch.gateway.DanglingIndicesState: [a65fc811a8a8] gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually
elasticsearch> [19:20:54.356Z #205.001 INFO  -            -   ] org.elasticsearch.node.Node: [a65fc811a8a8] initialized
elasticsearch> [19:20:54.357Z #205.001 INFO  -            -   ] org.elasticsearch.node.Node: [a65fc811a8a8] starting ...
elasticsearch> [19:20:54.614Z #205.001 INFO  -            -   ] org.elasticsearch.transport.TransportService: [a65fc811a8a8] publish_address {172.17.0.21:9300}, bound_addresses {0.0.0.0:9300}
elasticsearch> [19:20:55.452Z #205.001 INFO  -            -   ] org.elasticsearch.cluster.coordination.Coordinator: [a65fc811a8a8] setting initial configuration to VotingConfiguration{zinhYoeoQ_q-fGulfFN42g}
elasticsearch> [19:20:55.729Z #205.028 INFO  -            -   ] org.elasticsearch.cluster.service.MasterService: [a65fc811a8a8] elected-as-master ([1] nodes joined)[{a65fc811a8a8}{zinhYoeoQ_q-fGulfFN42g}{p4QxXvFvQgWh3Pi-VTmk4g}{172.17.0.21}{172.17.0.21:9300}{dimr} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 1, version: 1, delta: master node changed {previous [], current [{a65fc811a8a8}{zinhYoeoQ_q-fGulfFN42g}{p4QxXvFvQgWh3Pi-VTmk4g}{172.17.0.21}{172.17.0.21:9300}{dimr}]}
elasticsearch> [19:20:55.809Z #205.027 INFO  -            -   ] org.elasticsearch.cluster.coordination.CoordinationState: [a65fc811a8a8] cluster UUID set to [FDeYMVECTAGBjQSMxOtUGw]
elasticsearch> [19:20:55.859Z #205.024 INFO  -            -   ] org.elasticsearch.cluster.service.ClusterApplierService: [a65fc811a8a8] master node changed {previous [], current [{a65fc811a8a8}{zinhYoeoQ_q-fGulfFN42g}{p4QxXvFvQgWh3Pi-VTmk4g}{172.17.0.21}{172.17.0.21:9300}{dimr}]}, term: 1, version: 1, reason: Publication{term=1, version=1}
elasticsearch> [19:20:55.913Z #205.001 INFO  -            -   ] org.elasticsearch.http.AbstractHttpServerTransport: [a65fc811a8a8] publish_address {172.17.0.21:9200}, bound_addresses {0.0.0.0:9200}
elasticsearch> [19:20:55.913Z #205.001 INFO  -            -   ] org.elasticsearch.node.Node: [a65fc811a8a8] started
elasticsearch> [19:20:55.955Z #205.028 INFO  -            -   ] org.elasticsearch.gateway.GatewayService: [a65fc811a8a8] recovered [0] indices into cluster_state
elasticsearch> [19:26:02.442Z #205.028 INFO  -            -   ] org.elasticsearch.cluster.metadata.MetadataCreateIndexService: [a65fc811a8a8] [hudzh_item-000001] creating index, cause [api], templates [], shards [2]/[0]
elasticsearch> [19:26:03.639Z #205.028 WARN  -            -   ] org.elasticsearch.cluster.service.MasterService: [a65fc811a8a8] took [17.5s], which is over [10s], to compute cluster state update for [create-index [hudzh_item-000001], cause [api]]
elasticsearch> [19:26:15.364Z #205.028 INFO  -            -   ] org.elasticsearch.cluster.metadata.MetadataCreateIndexService: [a65fc811a8a8] [assigndeviceids_item-000001] creating index, cause [api], templates [], shards [2]/[0]
elasticsearch> [19:26:19.387Z #205.028 INFO  -            -   ] org.elasticsearch.cluster.routing.allocation.AllocationService: [a65fc811a8a8] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[assigndeviceids_item-000001][0], [assigndeviceids_item-000001][1]]]).
elasticsearch> [19:26:20.559Z #205.028 INFO  -            -   ] org.elasticsearch.cluster.metadata.MetadataMappingService: [a65fc811a8a8] [assigndeviceids_item-000001/-yvbBBiRR_maejjkGxNNlA] create_mapping [_doc]
elasticsearch> [19:26:27.288Z #205.028 INFO  -            -   ] org.elasticsearch.cluster.metadata.MetadataMappingService: [a65fc811a8a8] [assigndeviceids_item-000001/-yvbBBiRR_maejjkGxNNlA] update_mapping [_doc]
elasticsearch> [19:27:04.025Z #205.021 WARN  -            -   ] org.elasticsearch.monitor.jvm.JvmGcMonitorService: [a65fc811a8a8] [gc][young][359][4] duration [7.5s], collections [1]/[8.3s], total [7.5s]/[7.7s], memory [310.9mb]->[83.2mb]/[989.8mb], all_pools {[young] [269.5mb]->[4.1mb]/[273mb]}{[survivor] [34.1mb]->[34.1mb]/[34.1mb]}{[old] [7.2mb]->[44.9mb]/[682.6mb]}
elasticsearch> [19:27:04.027Z #205.021 WARN  -            -   ] org.elasticsearch.monitor.jvm.JvmGcMonitorService: [a65fc811a8a8] [gc][359] overhead, spent [7.5s] collecting in the last [8.3s]

Note the first line of the above log:

elasticsearch> OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.

Maybe related:

https://github.com/elastic/elasticsearch/issues/43911 https://github.com/elastic/elasticsearch/issues/36828

Note I did a comparison of the test environment where it's failing and on other envs where I don't see any problems.

All envs are using the same default CMS garbage collector. On my other envs though Elasticsearch is assigned 8GB where in my test env it's assigned the default 1GB.

I kinda thought 1GB would be enough though because as mentioned above, my tests are not particularly heavy on Elasticsearch with regards to the number of indexes and docs created?

UPDATE:

I tried increasing the mem from 1GB to 2GB, however, I still saw the GC warnings.

I also added some logs to each test class to print out some stats at the start of each test and did some comparisons.

During a run where the test failed and I see the GC warnings, I found that between the start of one test and the start of another test there were 3 young garbage collections, which took a combined total of 6 seconds, rather than the usual 50 milliseconds or so:

enter image description here

However, this only occurs in maybe 1 out of 5 test runs, which causes one or more tests to fail. Normally each young GC takes about 50ms or less on most occasions. The order the tests are run are the same each in the passing and failing cases.

Interestingly, this did seem to coincide with a 'delete all' operation that some of the tests run, here's another section of the same log:

enter image description here

It could be that that deleting of 78 docs triggered a few young GCs? I'm just not sure what it would happen only on some occasions?

Some other sections from same log:

enter image description here

enter image description here

enter image description here

enter image description here

Also here is a section of the cluster stats. I noticed that a particular test is creating a lot of indexes (24) and associated shards. The next test that comes along then performs a 'delete all' request. Perhaps this could be the cause of the GC warnings:

final DeleteIndexRequest deleteAll = new DeleteIndexRequest("_all");
client.indices().delete(deleteAll, RequestOptions.DEFAULT);

enter image description here

like image 824
rmf Avatar asked Oct 26 '21 11:10

rmf


1 Answers

I would start with this question: Is this an issue with Elasticsearch, JAVA, or your networked environment, including your test code?

To try to determine that I would propose the following steps:

  1. See what you are running for the JVM with Elasticsearch. Based on your log, you are using:
  • -XX:+UseConcMarkSweepGC
  • -XX:CMSInitiatingOccupancyFracion=75I
  • -XX:+UseCMSInitiatingOccupancyOnly

For this issue that you mentioned:

elasticsearch> OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.

You can try each one of these to see if it solves your problem:

  • Switching to the G1 GC Algorithm. It became the default after JAVA 9. It's easier to tune and has less agruments than CMS.
  • Switching to the Z GC Algorithm. Available for JAVA 11, your version. Its goal is to keep GC pause times less than 10ms.

No guarantees, but I would think that trying this should eliminate your JVM configuration as the root cause.

  1. Run Visual VM and do a heap dump and/or a thread dump after running your tests. Is it possible that something in your tests is causing a heap or threading issue? This could show you. Visual VM is no longer packeaged with JAVA in your version, but can be downloaded from Visual VM.

  2. Run GCEasy to see what analysis it shows.
    It has a limited free version with the following features:

  • 5 Uploads/user/month
  • 5 API calls/user/month
  • 10 mb file size/upload
  • Runs in our cloud
  • Data stored in our cloud
  • Standard Support
  • Few Graphs

Analzing gc logs is their specialty. I would think using the free version would point to your issue or give hints that you could use to expand with the information from the other steps I listed here.

  1. Normally I would agree that 1GB is enough memory, but I would bump it up to 2GB's as that was what we ran in our production environment as it may fix things.
like image 68
James Drinkard Avatar answered Sep 28 '22 10:09

James Drinkard