Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I mount a GCS bucket in a custom Docker image on AI Platform?

I'm using Google's AI Platform to train machine learning models using a custom Docker image. To run existing code without modifications, I would like to mount a GCS bucket inside the container.

I think one way to achieve this is to install gcloud to authentication and gcsfuse for mounting in the container. My Dockerfile looks like this:

FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04

WORKDIR /root

# Install system packages.
RUN apt-get update
RUN apt-get install -y curl
# ...

# Install gcsfuse.
RUN echo "deb http://packages.cloud.google.com/apt gcsfuse-bionic main" | tee /etc/apt/sources.list.d/gcsfuse.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
RUN apt-get update
RUN apt-get install -y gcsfuse

# Install gcloud.
RUN apt-get install -y apt-transport-https
RUN apt-get install -y ca-certificates
RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
RUN apt-get update
RUN apt-get install -y google-cloud-sdk

# ...

ENTRYPOINT ["entrypoint.sh"]

Inside the entry point script, I then try to authenticate with Google cloud and mount the bucket. My entrypoint.sh looks like this:

#!/bin/sh
set -e

gcloud auth login
gcsfuse my-bucket-name /root/output
python3 script.py --logdir /root/output/experiment

I then build the container and run it either locally for testing or remotely on the AI Platform for the full training run:

# Run locally for testing.
nvidia-docker build -t my-image-name .
nvidia-docker run -it --rm my-image-name

# Run on AI Platform for full training run.
nvidia-docker build -t my-image-name .
gcloud auth configure-docker
nvidia-docker push my-image-name
gcloud beta ai-platform jobs submit training --region us-west1 --scale-tier custom --master-machine-type standard_p100 --master-image-uri my-image-name

Both locally and on the AI Platform, the entrypoint.sh script hangs at the line gcloud auth login, probably because it waits for user input. Is there a better way of authenticating with Google Cloud from within the container? If not, how can I automate the line that currently hangs?

like image 941
danijar Avatar asked Oct 15 '22 10:10

danijar


1 Answers

Instead of using gcloud auth login which is primarily meant for human/user authentication, consider using gcloud auth activate-service-account and supplying a key file. See here for details:

https://cloud.google.com/sdk/gcloud/reference/auth/activate-service-account

I would recommend not placing the keys file inside the image but instead provide it externally. Another alternative is to realize that the authentication can implicit via environment variables. So following cloud native practices, have the environment provide the credentials needed and don't try and authenticate inside your environment at all. If you plan to run your container inside GCP Compute Engine or GKE you can implicitly provide the service account to the container from outside the container.

like image 179
Kolban Avatar answered Oct 21 '22 06:10

Kolban