Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow - Failed to fetch log file from worker. 404 Client Error: NOT FOUND for url

I am running Airflowv1.9 with Celery Executor. I have 5 Airflow workers running in 5 different machines. Airflow scheduler is also running in one of these machines. I have copied the same airflow.cfg file across these 5 machines. I have daily workflows setup in different queues like DEV, QA etc. (each worker runs with an individual queue name) which are running fine.

While scheduling a DAG in one of the worker (no other DAG have been setup for this worker/machine previously), I am seeing the error in the 1st task and as a result downstream tasks are failing:

*** Log file isn't local.
*** Fetching here: http://<worker hostname>:8793/log/PDI_Incr_20190407_v2/checkBCWatermarkDt/2019-04-07T17:00:00/1.log
*** Failed to fetch log file from worker. 404 Client Error: NOT FOUND for url: http://<worker hostname>:8793/log/PDI_Incr_20190407_v2/checkBCWatermarkDt/2019-04-07T17:00:00/1.log

I have configured MySQL for storing the DAG metadata. When I checked task_instance table, I see proper hostnames are populated against the task.

I also checked the log location and found that the log is getting created.

airflow.cfg snippet:

base_log_folder = /var/log/airflow
base_url = http://<webserver ip>:8082
worker_log_server_port = 8793
api_client = airflow.api.client.local_client
endpoint_url = http://localhost:8080

What am I missing here? What configurations do I need to check additionally for resolving this issue?

like image 313
riyaB Avatar asked Apr 09 '19 08:04

riyaB


1 Answers

Looks like the worker's hostname is not being correctly resolved. Add a file hostname_resolver.py:

import os
import socket
import requests
def resolve():
    """
    Resolves Airflow external hostname for accessing logs on a worker
    """
    if 'AWS_REGION' in os.environ:
        # Return EC2 instance hostname:
        return requests.get(
            'http://169.254.169.254/latest/meta-data/local-ipv4').text
    # Use DNS request for finding out what's our external IP:
    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    s.connect(('1.1.1.1', 53))
    external_ip = s.getsockname()[0]
    s.close()
    return external_ip

And export: AIRFLOW__CORE__HOSTNAME_CALLABLE=airflow.hostname_resolver:resolve

like image 88
dvainrub Avatar answered Nov 16 '22 01:11

dvainrub