Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Issue setting up Git-Sync with Airflow in a Pod

I am currently trying to setup Airflow to work in a Kubernetes like environment. For airflow to be useful, I need to be able to use the Git-Sync features so that the DAGs can be stored seperatly from the Pod, thus not being reset when the Pod downscales or restarts. I am trying to set it up with ssh.

I have been searching for good documentation on the Airflow config or tutorials on how to set this up properly, but this has been to no avail. I would very much appreciate some help here, as I have been struggling with this for a while.

Here is how i set the relevant config, please note I have some stand ins for links and some information due to security reasons:

git_repo = https://<git-host>/scm/<project-name>/airflow
git_branch = develop
git_subpath = dags
git_sync_root = /usr/local/airflow
git_sync_dest = dags
git_sync_depth = 1
git_sync_ssh = true
git_dags_folder_mount_point = /usr/local/airflow/dags
git_ssh_key_secret_name = airflow-secrets
git_ssh_known_hosts_configmap_name = airflow-configmap
dags_folder = /usr/local/airflow/
executor = KubernetesExecutor
dags_in_image = False

Here is how I have setup my origin/config repo:

-root
 |-configmaps/airflow
   |-airflow.cfg
   |-airflow-configmap.yaml
 |-environment
   |-<environment specific stuff>
 |-secrets
   |-airflow-secrets.yaml
 |-ssh
   |-id_rsa
   |-id_rsa.pub
 |-README.md

The airflow-conifgmap and secrets look like this:

apiVersion: v1
kind: Secret
metadata:
  name: airflow-secrets
data:
  # key needs to be gitSshKey
  gitSshKey: <base64 encoded private sshKey>

and

apiVersion: v1
kind: ConfigMap
metadata:
  name: airflow-configmap
data:
  known_hosts: |
      https://<git-host>/ ssh-rsa <base64 encoded public sshKey>

The repo that I am trying to sync to has the Public key set as an access key and is just a folder named dags with 1 dag inside.

My issue is that I do not know what my issue is at this point. I have no way of knowing what part of my config has been set correctly and what part of it is set incorrectly and documentation on the subject is very lackluster.

If there is more information that is required I will be happy to provide it.

Thank you for your time

like image 224
NobiliChili Avatar asked Sep 11 '25 23:09

NobiliChili


1 Answers

Whats the error you're seeing on doing this ?

Couple of things you need to consider:

  • Create an SSH key locally using this link and:

    1. Repository Name > Settings > Deploy Keys > Value of ssh_key.pub

    2. Ensure "write access" is checked

  • My Dockerfile I'm using looks like:

    FROM apache/airflow:2.1.2
    
    COPY requirements.txt .
    
    RUN python -m pip install --upgrade pip
    RUN pip install -r requirements.txt
    
  • The values.yaml from the official Airflow Helm repository (helm repo add apache-airflow https://airflow.apache.org) needs the following values updated under gitSync:

    • enabled: true

    • repo: ssh://[email protected]/username/repository-name.git

    • branch: master

    • subPaths: "" (if DAGs are in repository root)

    • sshKeySecret: airflow-ssh-git-secret

    • credentialsSecret: git-credentials

  • Export SSH key and known_hosts to Kubernetes secret for accessing the private repository

    kubectl create secret generic airflow-ssh-git-secret \
      --from-file=gitSshKey=/path/to/.ssh/id_ed25519 \
      --from-file=known_hosts=/path/to/.ssh/known_hosts \
      --from-file=id_ed25519.pub=/path/to/.ssh/id_ed25519.pub \
      -n airflow
    
  • Create and apply manifests:

    apiVersion: v1
    kind: Secret
    metadata:
      namespace: airflow
      name: airflow-ssh-git-secret
    data:
      gitSshKey: <base64_encoded_private_key_id_ed25519_in_one_line>
    
    apiVersion: v1
    kind: Secret
    metadata:
      name: git-credentials
    data:
      GIT_SYNC_USERNAME: base64_encoded_git_username
      GIT_SYNC_PASSWORD: base64_encoded_git_password
    
    apiVersion: v1
    kind: ConfigMap
    metadata:
      namespace: airflow
      name: known-hosts
    data:
      known_hosts: |
        line 1 of known_host file
        line 2 of known_host file
        line 3 of known_host file
        ...
    
  • Update Airflow release

    helm upgrade --install airflow apache-airflow/airflow -n airflow -f values.yaml --debug

  • Get pods in the airflow namespace

    kubectl get pods -n airflow

  • The airflow-scheduler-SOME-STRING pod is going to have 3 containers running. View the logs of container git-sync-init if you dont see the pods in Running state

like image 51
Saurabh Avatar answered Sep 13 '25 15:09

Saurabh