Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

First time with MongoDB + Docker - Set up from docker compose

I'd like to try a project I found on GitHub, so I installed MongoDB on MacOS and now I'm trying to understand how to set up it correctly through the docker compose file in the directory. This is the docker file:

version: '3'
services:
# replica set 1
  mongors1n1:
    container_name: mongors1n1
    image: mongo
    command: mongod --shardsvr --replSet mongors1 --dbpath /data/db --port 27017
    ports:
      - 27017:27017
    expose:
      - "27017"
    volumes:
      - ~/mongo_cluster/data1:/data/db

  mongors1n2:
    container_name: mongors1n2
    image: mongo
    command: mongod --shardsvr --replSet mongors1 --dbpath /data/db --port 27017
    ports:
      - 27027:27017
    expose:
      - "27017"
    volumes:
      - ~/mongo_cluster/data2:/data/db

  mongors1n3:
    container_name: mongors1n3
    image: mongo
    command: mongod --shardsvr --replSet mongors1 --dbpath /data/db --port 27017
    ports:
      - 27037:27017
    expose:
      - "27017"

    volumes:
      - ~/mongo_cluster/data3:/data/db

# replica set 2
  mongors2n1:
    container_name: mongors2n1
    image: mongo
    command: mongod --shardsvr --replSet mongors2 --dbpath /data/db --port 27017
    ports:
      - 27047:27017
    expose:
      - "27017"
    volumes:
      - ~/mongo_cluster/data4:/data/db

  mongors2n2:
    container_name: mongors2n2
    image: mongo
    command: mongod --shardsvr --replSet mongors2 --dbpath /data/db --port 27017
    ports:
      - 27057:27017
    expose:
      - "27017"
    volumes:
      - ~/mongo_cluster/data5:/data/db

  mongors2n3:
    container_name: mongors2n3
    image: mongo
    command: mongod --shardsvr --replSet mongors2 --dbpath /data/db --port 27017
    ports:
      - 27067:27017
    expose:
      - "27017"

    volumes:
      - ~/mongo_cluster/data6:/data/db

  # mongo config server
  mongocfg1:
    container_name: mongocfg1
    image: mongo
    command: mongod --configsvr --replSet mongors1conf --dbpath /data/db --port 27017
    expose:
      - "27017"
    volumes:
      - ~/mongo_cluster/config1:/data/db

  mongocfg2:
    container_name: mongocfg2
    image: mongo
    command: mongod --configsvr --replSet mongors1conf --dbpath /data/db --port 27017
    expose:
      - "27017"
    volumes:
      - ~/mongo_cluster/config2:/data/db

  mongocfg3:
    container_name: mongocfg3
    image: mongo
    command: mongod --configsvr --replSet mongors1conf --dbpath /data/db --port 27017

    expose:
      - "27017"
    volumes:
      - ~/mongo_cluster/config3:/data/db

# mongos router
  mongos1:
    container_name: mongos1
    image: mongo
    depends_on:
      - mongocfg1
      - mongocfg2
    command: mongos --configdb mongors1conf/mongocfg1:27017,mongocfg2:27017,mongocfg3:27017 --port 27017
    ports:
      - 27019:27017
    expose:
      - "27017"

  mongos2:
    container_name: mongos2
    image: mongo
    depends_on:
      - mongocfg1
      - mongocfg2
    command: mongos --configdb mongors1conf/mongocfg1:27017,mongocfg2:27017,mongocfg3:27017 --port 27017
    ports:
      - 27020:27017
    expose:
      - "27017"


# TODO after running docker-compose
# conf = rs.config();
# conf.members[0].priority = 2;
# rs.reconfig(conf);

And this is the script to run and create the shards etc..:

#!/bin/sh
docker-compose up
# configure our config servers replica set
docker exec -it mongocfg1 bash -c "echo 'rs.initiate({_id: \"mongors1conf\",configsvr: true, members: [{ _id : 0, host : \"mongocfg1\" },{ _id : 1, host : \"mongocfg2\" }, { _id : 2, host : \"mongocfg3\" }]})' | mongo"

# building replica shard
docker exec -it mongors1n1 bash -c "echo 'rs.initiate({_id : \"mongors1\", members: [{ _id : 0, host : \"mongors1n1\" },{ _id : 1, host : \"mongors1n2\" },{ _id : 2, host : \"mongors1n3\" }]})' | mongo"
docker exec -it mongors2n1 bash -c "echo 'rs.initiate({_id : \"mongors2\", members: [{ _id : 0, host : \"mongors2n1\" },{ _id : 1, host : \"mongors2n2\" },{ _id : 2, host : \"mongors2n3\" }]})' | mongo"


# we add shard to the routers
docker exec -it mongos1 bash -c "echo 'sh.addShard(\"mongors1/mongors1n1\")' | mongo "
docker exec -it mongos1 bash -c "echo 'sh.addShard(\"mongors2/mongors2n1\")' | mongo "

If I try to run directly the script I get the errors:

mongos1 | {"t":{"$date":"2021-07-25T09:03:56.101+00:00"},"s":"I", "c":"-", "id":4333222, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"RSM received error response","attr":{"host":"mongocfg3:27017","error":"HostUnreachable: Error connecting to mongocfg3:27017 (172.18.0.2:27017) :: caused by :: Connection refused","replicaSet":"mongors1conf","response":"{}"}}

mongos1 | {"t":{"$date":"2021-07-25T09:03:56.101+00:00"},"s":"I", "c":"NETWORK", "id":4712102, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"Host failed in replica set","attr":{"replicaSet":"mongors1conf","host":"mongocfg3:27017","error":{"code":6,"codeName":"HostUnreachable","errmsg":"Error connecting to mongocfg3:27017 (172.18.0.2:27017) :: caused by :: Connection refused"},"action":{"dropConnections":true,"requestImmediateCheck":false,"outcome":{"host":"mongocfg3:27017","success":false,"errorMessage":"HostUnreachable: Error connecting to mongocfg3:27017 (172.18.0.2:27017) :: caused by :: Connection refused"}}}}

And other errors like:

mongos1 | {"t":{"$date":"2021-07-25T09:05:39.743+00:00"},"s":"I", "c":"-", "id":4939300, "ctx":"monitoring-keys-for-HMAC","msg":"Failed to refresh key cache","attr":{"error":"FailedToSatisfyReadPreference: Could not find host matching read preference { mode: "nearest" } for set mongors1conf","nextWakeupMillis":1800}}

Shouldn't docker configure all the files without the user has to? Or do I need to create something manually like the database etc.?

EDIT: Here there are the first errors that show up when I run the script: log

like image 782
Fabio Avatar asked Jul 25 '21 09:07

Fabio


1 Answers

So here is an attempt at helping.. For the most part, the docker compose yaml file is pretty close, with exception of some minor port and binding parameters. There is an expectation that initialization will be additional commands. Example:

  1. docker-compose up the environment
  2. run some scripts to init the environment

... but this was already part of the original post.

So here is a docker compose file

docker-compose.yml

version: '3'
services:
 # mongo config server
  mongocfg1:
    container_name: mongocfg1
    hostname: mongocfg1
    image: mongo
    command: mongod --configsvr --replSet mongors1conf --dbpath /data/db --port 27019 --bind_ip_all
    volumes:
      - ~/mongo_cluster/config1:/data/db

  mongocfg2:
    container_name: mongocfg2
    hostname: mongocfg2
    image: mongo
    command: mongod --configsvr --replSet mongors1conf --dbpath /data/db --port 27019 --bind_ip_all
    volumes:
      - ~/mongo_cluster/config2:/data/db

  mongocfg3:
    container_name: mongocfg3
    hostname: mongocfg3
    image: mongo
    command: mongod --configsvr --replSet mongors1conf --dbpath /data/db --port 27019 --bind_ip_all
    volumes:
      - ~/mongo_cluster/config3:/data/db

# replica set 1
  mongors1n1:
    container_name: mongors1n1
    hostname: mongors1n1
    image: mongo
    command: mongod --shardsvr --replSet mongors1 --dbpath /data/db --port 27018 --bind_ip_all
    volumes:
      - ~/mongo_cluster/data1:/data/db

  mongors1n2:
    container_name: mongors1n2
    hostname: mongors1n2
    image: mongo
    command: mongod --shardsvr --replSet mongors1 --dbpath /data/db --port 27018 --bind_ip_all
    volumes:
      - ~/mongo_cluster/data2:/data/db

  mongors1n3:
    container_name: mongors1n3
    hostname: mongors1n3
    image: mongo
    command: mongod --shardsvr --replSet mongors1 --dbpath /data/db --port 27018 --bind_ip_all
    volumes:
      - ~/mongo_cluster/data3:/data/db

# replica set 2
  mongors2n1:
    container_name: mongors2n1
    hostname: mongors2n1
    image: mongo
    command: mongod --shardsvr --replSet mongors2 --dbpath /data/db --port 27018 --bind_ip_all
    volumes:
      - ~/mongo_cluster/data4:/data/db

  mongors2n2:
    container_name: mongors2n2
    hostname: mongors2n2
    image: mongo
    command: mongod --shardsvr --replSet mongors2 --dbpath /data/db --port 27018 --bind_ip_all
    volumes:
      - ~/mongo_cluster/data5:/data/db

  mongors2n3:
    container_name: mongors2n3
    hostname: mongors2n3
    image: mongo
    command: mongod --shardsvr --replSet mongors2 --dbpath /data/db --port 27018 --bind_ip_all
    volumes:
      - ~/mongo_cluster/data6:/data/db

# mongos router
  mongos1:
    container_name: mongos1
    hostname: mongos1
    image: mongo
    depends_on:
      - mongocfg1
      - mongocfg2
    command: mongos --configdb mongors1conf/mongocfg1:27019,mongocfg2:27019,mongocfg3:27019 --port 27017 --bind_ip_all
    ports:
      - 27017:27017

  mongos2:
    container_name: mongos2
    hostname: mongos2
    image: mongo
    depends_on:
      - mongocfg1
      - mongocfg2
    command: mongos --configdb mongors1conf/mongocfg1:27019,mongocfg2:27019,mongocfg3:27019 --port 27017 --bind_ip_all
    ports:
      - 27016:27017

... and some scripts to finalize the initialization...

docker-compose up -d

... Give it a few seconds to wind up, then issue...

# Init the replica sets (use the MONGOS host)
docker exec -it mongos1 bash -c "echo 'rs.initiate({_id: \"mongors1conf\",configsvr: true, members: [{ _id : 0, host : \"mongocfg1:27019\", priority: 2 },{ _id : 1, host : \"mongocfg2:27019\" }, { _id : 2, host : \"mongocfg3:27019\" }]})' | mongo --host mongocfg1:27019"
docker exec -it mongos1 bash -c "echo 'rs.initiate({_id : \"mongors1\", members: [{ _id : 0, host : \"mongors1n1:27018\", priority: 2 },{ _id : 1, host : \"mongors1n2:27018\" },{ _id : 2, host : \"mongors1n3:27018\" }]})' | mongo --host mongors1n1:27018"
docker exec -it mongos1 bash -c "echo 'rs.initiate({_id : \"mongors2\", members: [{ _id : 0, host : \"mongors2n1:27018\", priority: 2 },{ _id : 1, host : \"mongors2n2:27018\" },{ _id : 2, host : \"mongors2n3:27018\" }]})' | mongo --host mongors2n1:27018"

... again, give 10-15 seconds to allow the system to adjust to recent commands...

# ADD TWO SHARDS (mongors1, and mongors2)
docker exec -it mongos1 bash -c "echo 'sh.addShard(\"mongors1/mongors1n1:27018,mongors1n2:27018,mongors1n2:27018\")' | mongo"
docker exec -it mongos1 bash -c "echo 'sh.addShard(\"mongors2/mongors2n1:27018,mongors2n2:27018,mongors2n3:27018\")' | mongo"

Now, try to connect to the mongos from the host with docker running (assumes you have mongo shell installed on this host). Use 2 mongos hosts as the seed list.

mongo --host "localhost:27017,localhost:27016"

Comments

Notice how the priority for node0 is set to a priority of 2 in the init() call?

Notice how the config servers are all port 27019 - this follows recommendations by MongoDB.

Notice how the shard servers are all port 27018 - again, following mongo recommendations.

The mongos expose 2 ports 27017 (the natural port for MongoDB) and also port 27016 (a secondary mongos for high availability).

The config servers and the shard servers do not expose their respective ports - for security reasons. Should be using the mongos to get to the data. If need to have these ports open for administrative purposes simply add to the docker compose file.

The replica-set intercommunication is not using authentication. This is a security no-no. Need to decide which auth mechanism is best for your scenario - can use keyfile (just a text file that is identical among the members of the replica set) or x509 certs. If going with x509 then you need to include the CA.cert in each docker container for reference along with the individual cert per server with proper host name alignment. Would need to add the startup configuration item for the mongod processes to use whichever auth method was selected.

Logging is not specified. It probably makes sense to set the logging output of the mongod and mongos to the default location of /var/log/mongodb/mongod.log and /var/log/mongodb/mongos.log for these. Without specifying a logging strategy I believe mongo logs to standard out, which is suppressed if running docker-compose up -d.

Superuser: No users are yet created on the system. Usually for every replica set I stand up before adding it to a sharded cluster I like to add a super user account - one having root access - so if I need to make administrative changes at the replica set level I can. With the docker-compose approach you can create a super user from the mongos perspective and perform most all operations needed on a sharded cluster, but still - I like having the replica set user available.

OS tunables - Mongo likes to take up all the system resources. For a shared ecosystem where one physical host is hosting a bunch of mongo processes, you might want to consider specifying the wiredTiger cache size, etc. WiredTiger by default wants (System Memory Size - 1 GB) / 2. Also, you would benefit from setting ulimits to proper values - i.e., 64000 file handles per user is a good start - mongo potentially likes to use a lot of files. Also, filesystem should be mounted somewhere having xfs. This strategy is using the host system users home directory for database data directories. A more thoughtful approach could be welcomed here.

Anything else?

I am sure I am missing something. If you have any questions, please leave a comment and I will reply.

Update 1

The above docker-compose.yml file was missing the hostname attribute for some of the hosts, and this was causing balancer issues, so I have edited the docker-compose.yml to include hostname on all hosts.

Also, the addShard() method only referred to one host of the replica set. For completeness I added the other hosts to the addShard() method described above.

Following these steps will result in a brand new sharded cluster, but there are no user databases yet. As such, no user databases are sharded. So let's take a moment to add a database and shard it, then view the shard distributions (A.K.A., balancer results).

We must connect to the database via the mongos (as described above). This example assumes the use of the mongo shell.

mongo --host "localhost:27017,localhost:27016"

Databases in Mongo can be created a variety of ways. While there is no explicit database create command, there is an explicit create collection command (db.createCollection()). We must first set the database context using a 'use ' command...

use mydatabase
db.createCollection("mycollection")

... but rather than use this command we can create a database and collection by creating an index on a non-existing collection. (If you already created the collection, no worries, this next command should still be successful).

use mydatabase
db.mycollection.createIndex({lastName: 1, creationDate: 1})

In this example, I created a compound index on two fields...

  • lastName
  • creationDate

... on a collection that does not yet exist, on a database that does not yet exist. Once I issue this command, both the database and the collection will be created. Furthermore, I now have the basis for a shard key - the key to which sharding distribution will be based. This shard key will be based on this new index having these two fields.

Shard the database

Assuming I have issued the createIndex command, I can now turn on sharding at the database and issue the shardCollection command...

sh.enableSharding("mydatabase")
sh.shardCollection("mydatabase.mycollection", { "lastName": 1, "creationDate": 1})

Notice how the command 'shardCollection()' refers to our indexed fields created earlier? Assuming sharding has been successfully applied, we can now view the distribution of data by issuing the sh.status() command

sh.status()

Example of output: (new collection, no data yet, thus no real distribution of data - need to insert more than 64MB of data such that there is more than one chunk to distribute)

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "minCompatibleVersion" : 5,
    "currentVersion" : 6,
    "clusterId" : ObjectId("6101c030a98b2cc106034695")
  }
  shards:
        {  "_id" : "mongors1",  "host" : "mongors1/mongors1n1:27018,mongors1n2:27018,mongors1n3:27018",  "state" : 1,  "topologyTime" : Timestamp(1627504744, 1) }
        {  "_id" : "mongors2",  "host" : "mongors2/mongors2n1:27018,mongors2n2:27018,mongors2n3:27018",  "state" : 1,  "topologyTime" : Timestamp(1627504753, 1) }
  active mongoses:
        "5.0.1" : 2
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled: yes
        Currently running: no
        Failed balancer rounds in last 5 attempts: 0
        Migration results for the last 24 hours: 
                No recent migrations
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
        {  "_id" : "mydatabase",  "primary" : "mongors2",  "partitioned" : true,  "version" : {  "uuid" : UUID("bc890722-00c6-4cbe-a3e1-eab9692faf93"),  "timestamp" : Timestamp(1627504768, 2),  "lastMod" : 1 } }
                mydatabase.mycollection
                        shard key: { "lastName" : 1, "creationDate" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                mongors2    1
                        { "lastName" : { "$minKey" : 1 }, "creationDate" : { "$minKey" : 1 } } -->> { "lastName" : { "$maxKey" : 1 }, "creationDate" : { "$maxKey" : 1 } } on : mongors2 Timestamp(1, 0) 

Insert some data

To test out the sharding we can add some test data. Again, we want to distribute by lastName, and creationDate.

In mongoshell we can run javascript. Here is a script that will create test records such that data will be split and balanced. This will create 500,000 fake records. We need more than 64MB of data to create another chunk to balance. 500,000 records will make approx. 5 chunks. This takes a couple of minutes to run and complete.

use mydatabase

function randomInteger(min, max) {
    return Math.floor(Math.random() * (max - min) + min);
} 

function randomAlphaNumeric(length) {
  var result = [];
  var characters = 'abcdef0123456789';
  var charactersLength = characters.length;

  for ( var i = 0; i < length; i++ ) {
    result.push(characters.charAt(Math.floor(Math.random() * charactersLength)));
  }

  return result.join('');
}

function generateDocument() {
  return {
    lastName: randomAlphaNumeric(8),
    creationDate: new Date(),
    stringFixedLength: randomAlphaNumeric(8),
    stringVariableLength: randomAlphaNumeric(randomInteger(5, 50)),
    integer1: NumberInt(randomInteger(0, 2000000)),
    long1: NumberLong(randomInteger(0, 100000000)),
    date1: new Date(),
    guid1: new UUID()
  };
}

for (var j = 0; j < 500; j++) {
  var batch=[];

  for (var i = 0; i < 1000; i++) {
    batch.push(
      {insertOne: {
          document: generateDocument() 
        } 
      }
    );
  }
  
  db.mycollection.bulkWrite(batch, {ordered: false});
}

Give a few minutes and review in the mongoshell, if we now look at the shard status we should see chunks distributed across both shards...

sh.status()

... we should see something similar to ...

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "minCompatibleVersion" : 5,
    "currentVersion" : 6,
    "clusterId" : ObjectId("6101c030a98b2cc106034695")
  }
  shards:
        {  "_id" : "mongors1",  "host" : "mongors1/mongors1n1:27018,mongors1n2:27018,mongors1n3:27018",  "state" : 1,  "topologyTime" : Timestamp(1627504744, 1) }
        {  "_id" : "mongors2",  "host" : "mongors2/mongors2n1:27018,mongors2n2:27018,mongors2n3:27018",  "state" : 1,  "topologyTime" : Timestamp(1627504753, 1) }
  active mongoses:
        "5.0.1" : 2
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled: yes
        Currently running: yes
        Collections with active migrations: 
                config.system.sessions started at Wed Jul 28 2021 20:44:25 GMT+0000 (UTC)
        Failed balancer rounds in last 5 attempts: 0
        Migration results for the last 24 hours: 
                60 : Success
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
                config.system.sessions
                        shard key: { "_id" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                mongors1    965
                                mongors2    59
                        too many chunks to print, use verbose if you want to force print
        {  "_id" : "mydatabase",  "primary" : "mongors2",  "partitioned" : true,  "version" : {  "uuid" : UUID("bc890722-00c6-4cbe-a3e1-eab9692faf93"),  "timestamp" : Timestamp(1627504768, 2),  "lastMod" : 1 } }
                mydatabase.mycollection
                        shard key: { "lastName" : 1, "creationDate" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                mongors1    2
                                mongors2    3
                        { "lastName" : { "$minKey" : 1 }, "creationDate" : { "$minKey" : 1 } } -->> {
                            "lastName" : "00001276",
                            "creationDate" : ISODate("2021-07-28T20:42:00.867Z")
                        } on : mongors1 Timestamp(2, 0) 
                        {
                            "lastName" : "00001276",
                            "creationDate" : ISODate("2021-07-28T20:42:00.867Z")
                        } -->> {
                            "lastName" : "623292c2",
                            "creationDate" : ISODate("2021-07-28T20:42:01.046Z")
                        } on : mongors1 Timestamp(3, 0) 
                        {
                            "lastName" : "623292c2",
                            "creationDate" : ISODate("2021-07-28T20:42:01.046Z")
                        } -->> {
                            "lastName" : "c3f2a99a",
                            "creationDate" : ISODate("2021-07-28T20:42:06.474Z")
                        } on : mongors2 Timestamp(3, 1) 
                        {
                            "lastName" : "c3f2a99a",
                            "creationDate" : ISODate("2021-07-28T20:42:06.474Z")
                        } -->> {
                            "lastName" : "ed75c36c",
                            "creationDate" : ISODate("2021-07-28T20:42:03.984Z")
                        } on : mongors2 Timestamp(1, 6) 
                        {
                            "lastName" : "ed75c36c",
                            "creationDate" : ISODate("2021-07-28T20:42:03.984Z")
                        } -->> { "lastName" : { "$maxKey" : 1 }, "creationDate" : { "$maxKey" : 1 } } on : mongors2 Timestamp(2, 1) 

... Here we can see evidence of balancing activites. See label "chunks" for mongors1 and mongors2. While it is balancing our test collection it is also pre-splitting and balancing a different collection for session data. I believe this is a one-time system automation.

I hope these details help. Please let me know if you have any other questions.

like image 89
barrypicker Avatar answered Nov 15 '22 05:11

barrypicker