I'm using Google's AI Platform to train machine learning models using a custom Docker image. To run existing code without modifications, I would like to mount a GCS bucket inside the container.
I think one way to achieve this is to install gcloud
to authentication and gcsfuse
for mounting in the container. My Dockerfile looks like this:
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
WORKDIR /root
# Install system packages.
RUN apt-get update
RUN apt-get install -y curl
# ...
# Install gcsfuse.
RUN echo "deb http://packages.cloud.google.com/apt gcsfuse-bionic main" | tee /etc/apt/sources.list.d/gcsfuse.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
RUN apt-get update
RUN apt-get install -y gcsfuse
# Install gcloud.
RUN apt-get install -y apt-transport-https
RUN apt-get install -y ca-certificates
RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
RUN apt-get update
RUN apt-get install -y google-cloud-sdk
# ...
ENTRYPOINT ["entrypoint.sh"]
Inside the entry point script, I then try to authenticate with Google cloud and mount the bucket. My entrypoint.sh
looks like this:
#!/bin/sh
set -e
gcloud auth login
gcsfuse my-bucket-name /root/output
python3 script.py --logdir /root/output/experiment
I then build the container and run it either locally for testing or remotely on the AI Platform for the full training run:
# Run locally for testing.
nvidia-docker build -t my-image-name .
nvidia-docker run -it --rm my-image-name
# Run on AI Platform for full training run.
nvidia-docker build -t my-image-name .
gcloud auth configure-docker
nvidia-docker push my-image-name
gcloud beta ai-platform jobs submit training --region us-west1 --scale-tier custom --master-machine-type standard_p100 --master-image-uri my-image-name
Both locally and on the AI Platform, the entrypoint.sh
script hangs at the line gcloud auth login
, probably because it waits for user input. Is there a better way of authenticating with Google Cloud from within the container? If not, how can I automate the line that currently hangs?
Instead of using gcloud auth login
which is primarily meant for human/user authentication, consider using gcloud auth activate-service-account
and supplying a key file. See here for details:
https://cloud.google.com/sdk/gcloud/reference/auth/activate-service-account
I would recommend not placing the keys file inside the image but instead provide it externally. Another alternative is to realize that the authentication can implicit via environment variables. So following cloud native practices, have the environment provide the credentials needed and don't try and authenticate inside your environment at all. If you plan to run your container inside GCP Compute Engine or GKE you can implicitly provide the service account to the container from outside the container.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With