Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow Audit Logs

Tags:

airflow

I'm wondering what Airflow offers in the sense of Audit Logs. My Airflow environment is running Airflow version 1.10 and uses the [ldap] section of the airflow.cfg file to use my companies Active Dicrectory (AD) for authentication. I see when someone logs into Airflow through the Web UI it writes the users name into the webserver's log (shown below). I'm wondering though if Airflow can be modified to also log when the user turns on/off a DAG, creates a new Airflow Variable or Pool, Clears a Task, marks a Task as Success, and any other operation that a user can do.

I need to be able to have some sort of tractability to the user's activities because in order to use Airflow at my work I have to get it to pass a security review from an Architect and he requires the ability to trace user's activities.

Is this ability offered out of the box by Airflow? I see that if I were to go with Google Cloud's Airflow service called Cloud Composer then I would get Audit Logs through their service but unfortunately I'm tied to the Amazon Web Services (AWS) ecosystem and I am maintaining Airflow myself (not provided through a service).

I see on the airflow webserver logs that when I traverse the Airflow Web UI it's sending rest calls

161.179.215.170 - - [17/Sep/2018:16:39:26 -0400] "GET /admin/ HTTP/1.1" 200 71942 "http://1.2.3.4:8080/admin/airflow/graph?dag_id=ARL_OnDemand" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"

and when I log in I see it tells me the username (which is logged in the login function here https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/auth/backends/ldap_auth.py)

[2018-09-17 16:27:15,493] {ldap_auth.py:287} INFO - User foobaruser successfully authenticated
161.179.215.170 - - [17/Sep/2018:16:27:16 -0400] "POST /admin/airflow/login HTTP/1.1" 302 221 "http://1.2.3.4:8080/admin/airflow/login?next=%2Fadmin%2F" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"

So I'm wondering if there's a way for me to update the webserver logs so that every time it logs a GET or POST request it also logs the client who sent the request. This would satisfy my audit log needs because I would always know what user did what in Airflow on the UI.

Update:

In this article

https://wecode.wepay.com/posts/improving-airflow-ui-security

Apparently Airflow 1.10 has introduced a whole new Website Security architecture and they will be deprecating the original Flask UI in the future.

This piece I found interesting relevant to this post though is the part where she talks about action logging being passive instead of being preemptive, I wonder if that's related to Audit Logging?

During this time, several improvements were made on security, including adding an action logging feature and creating a hard-coded naive RBAC implementation. However, the action logging was passive rather than preemptive, and the native RBAC implementation still allowed read and write access to DAGs for all roles, so they didn’t address our security concerns.

WORKING SOLUTION:

Despite me saying I was on Airflow version 1.10 I was actually on Airflow version 1.9 :) On Airflow vesion 1.9 the Owner column on the Logs was always blank for me unless it said Airflow. But after upgrading to Airflow version 1.10 and connecting to my LDAP now I see my LDAP username (kbridenstine) logged under Owner every time I do a modifying command!

enter image description here

And for the icing on the cake Airflow is also logging when someone on the server runs an Airflow command (because you can modify Airflow via their CLI commands too). You can see this with the root and ec2-users I was using for Airflow on my ec2-instance server running Airflow.

like image 689
Kyle Bridenstine Avatar asked Sep 17 '18 19:09

Kyle Bridenstine


People also ask

How do you check Airflow logs?

You can also view the logs in the Airflow web interface. Streaming logs: These logs are a superset of the logs in Airflow. To access streaming logs, you can go to the logs tab of Environment details page in Google Cloud console, use the Cloud Logging, or use Cloud Monitoring. Logging and Monitoring quotas apply.

Does Airflow use log4j?

Many common logging libraries, such as log4j, offer log rotation strategies to clear out older logs. However, Airflow does not utilize anything like it.


1 Answers

I think the logs under AIRFLOW_WEB_SERVER_URL:PORT/admin/log/ should provide you with enough information i.e. if someone clear a dag using UI or cli as shown in the screenshot below.

Some of this metadata is retrieved from the MetaDB.

enter image description here

like image 145
kaxil Avatar answered Oct 15 '22 04:10

kaxil