I have configured an ECS service running on a g4dn.xlarge instance which has a single GPU. Inside the task definition I specify the container definition resource requirement to use one GPU as such:
"resourceRequirements": [
{
"type":"GPU",
"value": "1"
}
]
Running one task and one container on this instance works fine. When I set the service's desired task count to 2, I receive an event on the service that states:
service was unable to place a task because no container instance met all of its requirements. The closest matching container-instance has insufficient GPU resource available.
According to the AWS docs:
Amazon ECS will schedule to available GPU-enabled container instances and pin physical GPUs to proper containers for optimal performance.
If there any way to override this default behavior and force ECS to allow multiple container instances to share a single GPU?
I don't believe we will run into issues with performance on sharing as we plan to use the each container for H264 encoding (nvenc) which is not CUDA. If anyone can direct me to documentation concerning performance of CUDA on containers sharing a GPU, that would also be appreciated.
Deploy containers on the node You can deploy up to one container per multi-instance GPU device on the node. In this example, with a partition size of 1g. 5gb , there are seven multi-instance GPU partitions available on the node. As a result, you can deploy up to seven containers that request GPUs on this node.
You can run both containers using different host ports, and use a haproxy/nginx/varnish (native or inside another container) listening to the host port, and redirecting to the right container based on the URL. Show activity on this post. This is as much a question about the way tcp ports work as the way docker works.
You can connect multiple containers using user-defined networks and shared volumes. The container's main process is responsible for managing all processes that it starts.
Using an NVIDIA GPU inside a Docker container requires you to add the NVIDIA Container Toolkit to the host. This integrates the NVIDIA drivers with your container runtime. Calling docker run with the --gpu flag makes your hardware visible to the container.
GPU sharing on multiple containers is not supported at the moment, and it is unlikely to be supported anytime soon. You would need to have each virtual machine be a separate Kubernetes node, each with a separate GPU. Show activity on this post.
You can deploy up to one container per multi-instance GPU device on the node. In this example, with a partition size of 1g.5gb, there are seven multi-instance GPU partitions available on the node. As a result, you can deploy up to seven containers that request GPUs on this node.
We give multiple GPUs to a pod and the pod runs Triton which does the sharing. You have to define which GPU the model runs on else Triton runs the model on all the GPUs it can see (annoying behaviour). Triton is supposed to be incorporated into KFServing which sits on Knative but haven't tried it.
To use multi-instance GPUs, you perform the following tasks: 1 Create a cluster with multi-instance GPUs enabled. 2 Install drivers and configure GPU partitions. 3 Verify how many GPU resources are on the node. 4 Deploy containers on the node.
The tricks is to enable nvidia docker runtime by default for all containers if it is suitable for your use
Base on an Amazon AMI amazon/amzn2-ami-ecs-gpu-hvm-2.0.20200218-x86_64-ebs
, connect to the instance and add the configuration below :
sudo cat <<"EOF" > /etc/docker/daemon.json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/etc/docker-runtimes.d/nvidia"
}
}
}
EOF
sudo pkill -SIGHUP dockerd
tail -10 /var/log/messages
Create a new AMI and don't specify any values on GPU container definition.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With