Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove Airflow Scheduler logs

I am using Docker Apache airflow VERSION 1.9.0-2 (https://github.com/puckel/docker-airflow).

The scheduler produces a significant amount of logs, and the filesystem will quickly run out of space, so I am trying to programmatically delete the scheduler logs created by airflow, found in the scheduler container in (/usr/local/airflow/logs/scheduler)

I have all of these maintenance tasks set up: https://github.com/teamclairvoyant/airflow-maintenance-dags

However, these tasks only delete logs on the worker, and the scheduler logs are in the scheduler container.

I have also setup remote logging, sending logs to S3, but as mentioned in this SO post Removing Airflow task logs this setup does not stop airflow from writing to the local machine.

Additionally, I have also tried creating a shared named volume between the worker and the scheduler, as outlined here Docker Compose - Share named volume between multiple containers. However, I get the following error in worker:

ValueError: Unable to configure handler 'file.processor': [Errno 13] Permission denied: '/usr/local/airflow/logs/scheduler'

and the following error in scheduler:

ValueError: Unable to configure handler 'file.processor': [Errno 13] Permission denied: '/usr/local/airflow/logs/scheduler/2018-04-11'

And so, how do people delete scheduler logs??

like image 687
Ryan Stack Avatar asked Apr 11 '18 20:04

Ryan Stack


People also ask

How do I delete airflow logs?

To clean up scheduler log files I do delete them manually two times in a week to avoid the risk of logs deleted which needs to be required for some reasons. I clean the logs files by [sudo rm -rd airflow/logs/] command.

Where are airflow scheduler logs stored?

If you run Airflow locally, logging information will be accessible in the following locations: Scheduler logs are printed to the console and accessible in $AIRFLOW_HOME/logs/scheduler . Webserver and Triggerer logs are printed to the console. Task logs can be viewed either in the Airflow UI or at $AIRFLOW_HOME/logs/ .

How do I view airflow scheduler logs?

You can also view the logs in the Airflow web interface. Streaming logs: These logs are a superset of the logs in Airflow. To access streaming logs, you can go to the logs tab of Environment details page in Google Cloud console, use the Cloud Logging, or use Cloud Monitoring. Logging and Monitoring quotas apply.

What is airflow scheduler?

The Airflow scheduler is designed to run as a persistent service in an Airflow production environment. To kick it off, all you need to do is execute the airflow scheduler command. It uses the configuration specified in airflow. cfg . The scheduler uses the configured Executor to run tasks that are ready.


2 Answers

Inspired by this reply, I have added the airflow-log-cleanup.py DAG (with some changes to its parameters) from here to remove all old airflow logs, including scheduler logs.

My changes are minor except that given my EC2's disk size (7.7G for /dev/xvda1), 30 days default value for DEFAULT_MAX_LOG_AGE_IN_DAYS seemed too large so (I had 4 DAGs) I changed it to 14 days, but feel free to adjust it according to your environment:

DEFAULT_MAX_LOG_AGE_IN_DAYS = Variable.get("max_log_age_in_days", 30) changed to DEFAULT_MAX_LOG_AGE_IN_DAYS = Variable.get("max_log_age_in_days", 14)

like image 158
HaMi Avatar answered Sep 17 '22 15:09

HaMi


Following could be one option to resolve this issue.

Login to the docker container using following mechanism

#>docker exec -it <name-or-id-of-container> sh

While running above command make sure - container is running.

and then use cron jobs to configure scheduled rm command on those log files.

like image 39
fly2matrix Avatar answered Sep 18 '22 15:09

fly2matrix