I cannot pull artifact registry images to a newly created GKE cluster with Terraform and a user-defined service account.
The terraform used to stand up the cluster is below.
locals {
service = "example"
resource_prefix = format("%s-%s", local.service, var.env)
location = format("%s-b", var.gcp_region)
}
resource "google_service_account" "main" {
account_id = format("%s-sa", local.resource_prefix)
display_name = format("%s-sa", local.resource_prefix)
project = var.gcp_project
}
resource "google_container_cluster" "main" {
name = local.resource_prefix
description = format("Cluster primarily servicing the service %s", local.service)
location = local.location
remove_default_node_pool = true
initial_node_count = 1
}
resource "google_container_node_pool" "main" {
name = format("%s-node-pool", local.resource_prefix)
location = local.location
cluster = google_container_cluster.main.name
node_count = var.gke_cluster_node_count
node_config {
preemptible = true
machine_type = var.gke_node_machine_type
# Google recommends custom service accounts that have cloud-platform scope and permissions granted via IAM Roles.
service_account = google_service_account.main.email
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/trace.append"
]
}
autoscaling {
min_node_count = var.gke_cluster_autoscaling_min_node_count
max_node_count = var.gke_cluster_autoscaling_max_node_count
}
}
I run a helm deployment to deploy an application and get the following issue.
default php-5996c7fbfd-d6xf5 0/1 ImagePullBackOff 0 37m
Normal Pulling 36m (x4 over 37m) kubelet Pulling image "europe-docker.pkg.dev/example-999999/eu.gcr.io/example-php-fpm:latest"
Warning Failed 36m (x4 over 37m) kubelet Failed to pull image "europe-docker.pkg.dev/example-999999/eu.gcr.io/example-php-fpm:latest": rpc error: code = Unknown desc = failed to pull and unpack image "europe-docker.pkg.dev/example-999999/eu.gcr.io/example-php-fpm:latest": failed to resolve reference "europe-docker.pkg.dev/example-999999/eu.gcr.io/example-php-fpm:latest": failed to authorize: failed to fetch oauth token: unexpected status: 403 Forbidden
Warning Failed 36m (x4 over 37m) kubelet Error: ErrImagePull
Warning Failed 35m (x6 over 37m) kubelet Error: ImagePullBackOff
Seems to me that I've missed something to do with the service account. Although using cloud ssh I am able to generate an oauth token, but that also does not work using crictl
UPDATE: issue resolved
I have been able to resolve my problem with the following additional terraform code.
resource "google_project_iam_member" "artifact_role" {
role = "roles/artifactregistry.reader"
member = "serviceAccount:${google_service_account.main.email}"
project = var.gcp_project
}
As error says : unexpected status: 403 Forbidden
You might be having an issue with the Deployment imagepull secret.
For GKE you can use the service account JSON
Ref doc : https://cloud.google.com/container-registry/docs/advanced-authentication#json-key
Terraform create secret in GKE which you can use it to deployment
resource "kubernetes_secret" "gcr" {
type = "kubernetes.io/dockerconfigjson"
metadata {
name = "gcr-image-pull"
namespace = "default"
}
data = {
".dockerconfigjson" = jsonencode({
auths = {
"gcr.io" = {
username = "_json_key"
password = base64decode(google_service_account_key.myaccount.private_key)
email = google_service_account.main.email
auth = base64encode("_json_key:${ base64decode(google_service_account_key.myaccount.private_key) }")
}
}
})
}}
Or use the kubectl to create the secret
kubectl create secret docker-registry gcr \
--docker-server=gcr.io \
--docker-username=_json_key \
--docker-password="$(cat google-service-account-key.json)" \
--docker-email=<Email address>
Now if you have the POD or deployment you can create YAML config like
apiVersion: v1
kind: Pod
metadata:
name: uses-private-registry
spec:
containers:
- name: hello-app
image: <image URI>
imagePullSecrets:
- name: secret-that-you-created
Update:
As per Guillaume's suggestion for GKE/GCP you can follow *workload identity* option as best practice with other extern repo it might could not work.
Create the IAM service account in GCP:
gcloud iam service-accounts create gke-workload-indentity \
--project=<project-id>
Create a service account in the K8s cluster :
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
iam.gke.io/gcp-service-account: [email protected]
name: gke-sa-workload
namespace: default
Policy binding run below Gcloud command :
gcloud iam service-accounts add-iam-policy-binding gke-workload-indentity@PROJECT_ID.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:PROJECT_ID.svc.id.goog[default/K8s_SANAME]"
Now you can create the deployment POD with image in GCR/astifact repo just update the ServiceAccount
spec:
serviceAccountName: gke-sa-workload
containers:
- name: container
image: IMAGE
Read more at : https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
Turning comment to answer as it resolved @David's issue.
Because the user defined service account is being used for the node_pool the appropriate roles need to be bound to this service account.
In this case: roles/artifactregistry.reader
Configuring artifact registry permissions
Best practice is to grant the minimum required roles.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With