I am wondering if it is possible to create a virtual cluster with Docker so that I can run scripts that have been designed for HPC clusters using SGE cluster management. These are pretty large/complicated workflows, so its not just something I can re-write, say for TORQUE/PBS. Theoretically I should be able to trick Docker into thinking there are multiple nodes, just like my internal HPC cluster. If someone can save me the pain telling me it can't be done, I would be greatly appreciative.
Warning: I am not a cluster admin. I'm more like the end user. I am running on my Mac OSX 10.9.5
Client version: 1.7.0 Client API version: 1.19 Go version (client): go1.4.2 Git commit (client): 0baf609 OS/Arch (client): darwin/amd64 Server version: 1.7.0 Server API version: 1.19 Go version (server): go1.4.2 Git commit (server): 0baf609 OS/Arch (server): linux/amd64 bash-3.2$ boot2docker version Boot2Docker-cli version: v1.7.0 Git commit: 7d89508
I've been using a derivative of an image (the Dockerfile
is here). My steps are pretty straightforward and follow the instructions on the website:
docker-machine create -d virtualbox local
eval "$(docker-machine env local)"
docker run --rm swarm create
docker-machine create \ -d virtualbox \ --swarm \ --swarm-master \ --swarm-discovery token://$TOKEN \ swarm-master
docker-machine create \ -d virtualbox \ --swarm \ --swarm-discovery token://$TOKEN \ swarm-agent-00
docker-machine create \ -d virtualbox \ --swarm \ --swarm-discovery token://$TOKEN \ swarm-agent-01
Now here is the crazy part. When I try to source the image using this command: eval "$(docker-machine env --swarm swarm-master)"
I get this stupid thing Cannot connect to the Docker daemon. Is 'docker -d' running on this host?
. I then tried eval $(docker-machine env swarm-master)
and it works, but I'm not 100% sure its the right thing to do:
NAME ACTIVE DRIVER STATE URL SWARM
local virtualbox Running tcp://192.168.99.105:2376
swarm-agent-00 virtualbox Running tcp://192.168.99.107:2376 swarm-master
swarm-agent-01 virtualbox Running tcp://192.168.99.108:2376 swarm-master
swarm-master * virtualbox Running tcp://192.168.99.106:2376 swarm-master (master)
bior: image: stevenhart/bior_annotate command: login -f sgeadmin volumes: - .:/Data links: - sge sge: build: . ports: - "6444" - "6445" - "6446"
using docker-compose up
docker run -it --rm dockersge_sge login -f sgeadmin
when I run qhost
I get the following:
HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO SWAPUS ---------------------------------------------------------------------------------------------- global - - - - - - - - - - 6bf6f6fda409 lx-amd64 1 1 1 1 0.01 996.2M 96.2M 1.1G 0.0
Shouldn't it think there are multiple CPUs, i.e. each one of my swarm nodes?
I assume you are running qhost inside your docker.
The thing with swarm is, that it doesn't combine all the hosts into one big machine (I used to think so).
Instead, you have, for example, 5 one core machines, then swarm will pick a machine with as few dockers as possible and run the docker on that machine.
So swarm is the controller who spreads the dockers in a cluster, rather than combining the hosts into one.
Hope it helps! If you have additional questions, please ask :)
UPDATE
I'm not sure if it suits you, but if you don't get it with swarm, I would recommend kubernetes. I use it on my Raspberry Pis. It is very cool and more mature than swarm, with things like auto healing and so on.
I don't know, but surely there's a way of integrating docker with hadoop too...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With