Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programatically create SSH tunnel inside of dockerized apache airflow python operator

My program is unable to create an SSH tunnel while inside of my docker container running apache airflow. Only running the function on my local machine works fine. I have a list of servers which I use to create a tunnel, query the database, and close the connection. Normally, I'd do it the following way:

for server in servers:
    server_conn = sshtunnel.SSHTunnelForwarder(
        server,
        ssh_username=ssh_user,
        ssh_password=ssh_password,
        remote_bind_address=(localhost, db_port),
        local_bind_address=(localhost, localport)
    )

This works as expected and I can do whatever I need from there. However, within Docker, it does not work. I realize that docker runs and binds to a port and is not actually apart of the host system, so I used network_mode="host" to help mitigate this issue. However, this does not work because my containers lose the ability to communicate with one another. Here is my docker-compose file

    postgres:
        image: postgres:9.6
        environment:
            - POSTGRES_USER=airflow
            - POSTGRES_PASSWORD=airflow
            - POSTGRES_DB=airflow
            - PGDATA=/var/lib/postgresql/data/pgdata
        volumes:
            - ~/.whale/pgdata:/var/lib/postgresql/data/pgdata
            - ./dags/docker/sql/create.sql:/docker-entrypoint-initdb.d/init.sql
        ports:
          - "5432:5432"

    webserver:
        image: hawk
        build:
            context: .
            dockerfile: ./dags/docker/Dockerfile-airflow
        restart: always
        depends_on:
            - postgres
            # - redis
        environment:
            - LOAD_EX=n
            - FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho=
            - EXECUTOR=Local
        volumes:
            - ./dags:/usr/local/airflow/dags
            # Uncomment to include custom plugins
            # - ./plugins:/usr/local/airflow/plugins
        ports:
            - "8080:8080"
            - "52023:22"
        command: webserver
        healthcheck:
            test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
            interval: 30s
            timeout: 30s
            retries: 3

I also followed the instructions here and got to the point where I can docker exec into my container and manually type the above python snippet and get a working connection. Additionally, I have read the airflow documentation here which covers SSH connection operators, but those only support bash commands, I will need my python function to run. I am truly confused why the python code would work while exec-ed into the system, but not when I run it via my airflow DAG. At this time, I am unable to manually put all of the connections in because there will be > 100 once this system deploys. Any help would be greatly appreciated. If more depth is needed, please let me know.

like image 945
Jarred Parr Avatar asked Mar 01 '26 12:03

Jarred Parr


1 Answers

I was having this same issue when opening the tunnel and trying to connect to the database in separate tasks, but got it working by doing both in the same task (Airflow doesn't persist state between task runs):

def select_from_tunnel_db():
    # Open SSH tunnel
    ssh_hook = SSHHook(ssh_conn_id='bastion-ssh-conn', keepalive_interval=60)
    tunnel = ssh_hook.get_tunnel(5432, remote_host='<db_host>', local_port=5432)
    tunnel.start()

    # Connect to DB and run query
    pg_hook = PostgresHook(
        postgres_conn_id='remote-db-conn',  # NOTE: host='localhost'
        schema='db_schema'
    )
    pg_cursor = pg_hook.get_conn().cursor()
    pg_cursor.execute('SELECT * FROM table;')
    select_val = pg_cursor.fetchall()

    return select_val


python_operator = PythonOperator(
    task_id='test_tunnel_conn',
    python_callable=select_from_tunnel_db,
    dag=dag
)

This forwards traffic on port 5432 from the local machine to the same port on the remote db host. The SSHHook requires a working ssh connection to the endpoint you will be tunneling through and PostgresHook requires a postgres connection to 'localhost' on port 5432.

like image 139
Sbrom Avatar answered Mar 04 '26 00:03

Sbrom



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!