I am attempting to setup a basic dev 3 node 7.7.0 ES cluster in docker compose following the official documentation
here but cannot get all 3 nodes to operate at the same time. After running docker-compose up
at least one of the containers exits either right away or shortly after like this:
Starting es01 ... done
Starting es03 ... done
Starting es02 ... done
Attaching to es03, es02, es01
es01 exited with code 137
If I try and bring it back up by running docker-compose start es01
(or whichever one happened to exit, it's random sometimes) it causes these errors that don't stop until I kill all containers:
es03 | {"type": "server", "timestamp": "2020-05-25T15:52:37,214Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "es-docker-cluster", "node.name": "es03", "message": "master not discovered or elected yet, an election requires 2 nodes with ids [9RNLWSMbTmetFFh3Tm1q0g, 3CbHK5iBQAqt7poPRsMmaw], have discovered [{es03}{3CbHK5iBQAqt7poPRsMmaw}{AGMLafOfRyyMVeEuFuaTOA}{172.25.0.2}{172.25.0.2:9300}{dilmrt}{ml.machine_memory=2085416960, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}, {es02}{9RNLWSMbTmetFFh3Tm1q0g}{bnuXTgAqQyCz04G7Cva1sA}{172.25.0.4}{172.25.0.4:9300}{dilmrt}{ml.machine_memory=2085416960, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}] which is a quorum; discovery will continue using [172.25.0.4:9300] from hosts providers and [{es03}{3CbHK5iBQAqt7poPRsMmaw}{AGMLafOfRyyMVeEuFuaTOA}{172.25.0.2}{172.25.0.2:9300}{dilmrt}{ml.machine_memory=2085416960, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
es02 | {"type": "server", "timestamp": "2020-05-25T15:52:37,936Z", "level": "WARN", "component": "o.e.d.SeedHostsResolver", "cluster.name": "es-docker-cluster", "node.name": "es02", "message": "failed to resolve host [es01]", "cluster.uuid": "e16AK3EnQ-28Xky2afhx4A", "node.id": "9RNLWSMbTmetFFh3Tm1q0g"
If I don't attempt to bring back up the container that exited the other two will seemingly work fine, if I go to http://localhost:9200/_cluster/health
I can see
{"cluster_name":"es-docker-cluster","status":"green","timed_out":false,"number_of_nodes":2,"number_of_data_nodes":2,"active_primary_shards":0,"active_shards":0,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":100.0}
I am using Docker Desktop version 2.2.0.5 for MacOS Catalina version 10.15.4 with Docker Compose version 1.25.4
This is my docker-compose.yml file for reference:
version: '2.2'
services:
es01:
image: docker.elastic.co/elasticsearch/elasticsearch:7.7.0
container_name: es01
environment:
- node.name=es01
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es02,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- data01:/usr/share/elasticsearch/data
ports:
- 9200:9200
networks:
- elastic
es02:
image: docker.elastic.co/elasticsearch/elasticsearch:7.7.0
container_name: es02
environment:
- node.name=es02
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- data02:/usr/share/elasticsearch/data
networks:
- elastic
es03:
image: docker.elastic.co/elasticsearch/elasticsearch:7.7.0
container_name: es03
environment:
- node.name=es03
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es02
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- data03:/usr/share/elasticsearch/data
networks:
- elastic
volumes:
data01:
driver: local
name: data01
data02:
driver: local
name: data02
data03:
driver: local
name: data03
networks:
elastic:
driver: bridge
name: elastic
My docker desktop memory settings were at a 2gb, once I set it to above 4gb the issue stopped.
Turns out exit code 137 with docker containers is commonly due to memory allocation issues!
For me exit code 137 was caused by giving so much memory to Elasticsearch that it tried to reserve more than my system's available memory for the replica. The primary took up 6 GB in docker stats
, so Elasticsearch was reserving another 6 GB for a replica (but never allocating it because it was on the same node).
I limited the heap size in docker-compose.yml:
environment:
- ES_JAVA_OPTS=-Xms750m -Xmx750m
This ran everything just as fast in 1 GB of memory as it formerly did in 6 GB, without crashing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With