My use case is a little different than others with this problem, so a little up-front description:
I am working on Google Cloud and have a "dockerized" Django app. Part of the app depends on using gsutil for moving files to/from a Google Storage bucket. For various reasons, we do not want to use Google Container Engine to manage our containers. Rather, we would like to scale horizontally by starting additional Google Compute VMs which will, in turn, run this Docker container. Similar to https://cloud.google.com/python/tutorials/bookshelf-on-compute-engine except we will use a container rather than pulling a git repository.
The VM's will be built from a basic Debian image, and the startup and installation of dependencies (e.g. Docker itself) will be orchestrated with a startup script (e.g. gcloud compute instances create some-instance --metadata-from-file startup-script=/path/to/startup.sh
).
If I manually create a VM, elevate with sudo -s
, run gsutil config -f
(which creates a credential file at /root/.boto) and then run my docker container (see Dockerfile below) with
docker run -v /root/.boto:/root/.boto username/gs gsutil ls gs://my-test-bucket
then it works. However, that requires my interaction to create the boto file.
My question is: How can I pass the default service credentials to the Docker container that will be starting in that new VM?
gsutil works out of the box on even a "fresh" Debian VM since it is using the default compute engine credentials that all VMs are loaded with. Is there a way to use those credentials and pass them to the docker container? After the first call to gsutil on a fresh VM, I've noticed that it creates ~/.gsutil and ~/.config folders. Unfortunately, mounting both of those in Docker with
docker run -v ~/.config/:/root/.config -v ~/.gsutil:/root/.gsutil username/gs gsutil ls gs://my-test-bucket
does not fix my problem. It tells me:
ServiceException: 401 Anonymous users does not have storage.objects.list access to bucket my-test-bucket.
A minimal gsutil Dockerfile (not mine):
FROM alpine
#install deps and install gsutil
RUN apk add --update \
python \
py-pip \
py-cffi \
py-cryptography \
&& pip install --upgrade pip \
&& apk add --virtual build-deps \
gcc \
libffi-dev \
python-dev \
linux-headers \
musl-dev \
openssl-dev \
&& pip install gsutil \
&& apk del build-deps \
&& rm -rf /var/cache/apk/*
CMD ["gsutil"]
Addition: a workaround:
I have since solved my issue, but it is quite roundabout so I'm still interested in a simpler way, if possible. All the details are below:
First, a description:
I first created a service account in the web console. I then save the JSON keyfile (call it credentials.json) into a storage bucket. In the startup script for the GCE VM, I copy that keyfile to the local filesystem (gsutil cp gs://<bucket>/credentials.json /gs_credentials/
). I then start my docker container, mounting that local directory. Then, as the docker container starts, it runs a script that authenticates the credentials.json (which creates a .boto file inside the docker), export BOTO_PATH=, and finally I can perform gsutil operations in the Docker container.
Here are the files for a small working example:
Dockerfile:
FROM alpine
#install deps and install gsutil
RUN apk add --update \
python \
py-pip \
py-cffi \
py-cryptography \
bash \
curl \
&& pip install --upgrade pip \
&& apk add --virtual build-deps \
gcc \
libffi-dev \
python-dev \
linux-headers \
musl-dev \
openssl-dev \
&& pip install gsutil \
&& apk del build-deps \
&& rm -rf /var/cache/apk/*
# install the gcloud SDK-
# this allows us to use gcloud auth inside the container
RUN curl -sSL https://sdk.cloud.google.com > /tmp/gcl \
&& bash /tmp/gcl --install-dir=~/gcloud --disable-prompts
RUN mkdir /startup
ADD gsutil_docker_startup.sh /startup/gsutil_docker_startup.sh
ADD get_account_name.py /startup/get_account_name.py
ENTRYPOINT ["/startup/gsutil_docker_startup.sh"]
gsutil_docker_startup.sh: Takes a single argument, which is the path to a JSON-format service account credentials file. The file exists because the directory on the host machine was mounted in the container.
#!/bin/bash
CRED_FILE_PATH=$1
mkdir /results
# List the bucket, see that it gives a "ServiceException:401"
gsutil ls gs://<input bucket> > /results/before.txt
# authenticate the credentials- this creates a .boto file:
/root/gcloud/google-cloud-sdk/bin/gcloud auth activate-service-account --key-file=$CRED_FILE_PATH
# need to extract the service account which is like:
# <service acct ID>@<google project>.iam.gserviceaccount.com"
SERVICE_ACCOUNT=$(python /startup/get_account_name.py $CRED_FILE_PATH)
# with that service account, we can locate the .boto file:
export BOTO_PATH=/root/.config/gcloud/legacy_credentials/$SERVICE_ACCOUNT/.boto
# List the bucket and copy the file to an output bucket for good measure
gsutil ls gs://<input bucket> > /results/after.txt
gsutil cp /results/*.txt gs://<output bucket>/
get_account_name.py:
import json
import sys
j = json.load(open(sys.argv[1]))
sys.stdout.write(j['client_email'])
Then, the GCE startup script (executed automatically as the VM is started) is:
#!/bin/bash
# <SNIP>
# Install docker, other dependencies
# </SNIP>
# pull docker image
docker pull userName/containerName
# get credential file:
mkdir /cloud_credentials
gsutil cp gs://<bucket>/credentials.json /cloud_credentials/creds.json
# run container
# mount the host machine directory where the credentials were saved.
# Note that the container expects a single arg,
# which is the path to the credential file IN THE CONTAINER
docker run -v /cloud_credentials:/cloud_credentials \
userName/containerName /cloud_credentials/creds.json
docker-credential-gcr is Google Container Registry's standalone, gcloud SDK-independent Docker credential helper. It allows for v18. 03+ Docker clients to easily make authenticated requests to GCR's repositories (gcr.io, eu.gcr.io, etc.).
You can assign a Specific Service Account to your instance and then use the Application Default Credential in your code. Please verify these points before testing.
Authentication Token are retrieved automatically by Application Default Credential via Google Metadata Server which is available from your instance and your Docker containers as well. There is no need to manage any credentials.
def implicit():
from google.cloud import storage
# If you don't specify credentials when constructing the client, the
# client library will look for credentials in the environment.
storage_client = storage.Client()
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
I also quickly tested with docker and I worked perfectly
yann@test:~$ gsutil cat gs://my-test-bucket/hw.txt
Hello World
yann@test:~$ docker run --rm google/cloud-sdk gsutil cat gs://my-test-bucket/hw.txt
Hello World
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With