Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limit Airflow DAG Visibility By AD/LDAP Groups

Tags:

airflow

Is is possible to limit visibility and accessibility of DAGs by user groups in Airflow?

For example, I want to have one large Airflow environment for my entire company, different teams will be using this Airflow environment for their team's workflows. Say we have team A and team B who both belong to their respective AD/LDAP groups, group A and group B. Is it possible to have group A only see the DAGs that belong to their team and vice versa with group B?

Based on my research and understanding I don't think this will be possible on a single Airflow environment. I think in order for me to do this I will need to create a separate Airflow environment for each team so that each team will have their own Airflow Dags folder containing their respective DAGs.

like image 251
Kyle Bridenstine Avatar asked Jun 27 '18 15:06

Kyle Bridenstine


People also ask

What is Rbac in Airflow?

This page describes Airflow UI Access Control (also called Airflow Role-Based Access Control, or Airflow RBAC) in Cloud Composer. This feature provides an additional mechanism to separate users in the Airflow UI and DAG UI of your environment.

Where does Airflow look for DAGs?

Airflow looks in your DAGS_FOLDER for modules that contain DAG objects in their global namespace and adds the objects it finds in the DagBag .


1 Answers

I think there are two different problems posed here:

First, LDAP authentication. Airflow provides support for LDAP authentication built on ldap3. The example in the linked doc shows how to associate Airflow roles with LDAP groups (e.g., the data_profiler_filter part).

Second, restricting DAG access by group. As of the time of this writing, the current version of Airflow (1.9), doesn't support limiting visibility of DAGs by group. The recent work on role-based access control (RBAC) changes this. I've listed 3 different options for addressing this problem below.


Option 1 - RBAC (most control, available in Airflow ≥ 1.10)

The new RBAC features add support for permissions like this and is the best for fine-grained control. It uses a permission system built on Flask App Builder. This was created by a company with a very similar use case to what you mentioned which is discussed in more detail in the Jira issue.

More info can be found in:

  • RBAC proposal
  • AIRFLOW-85 - Create DAGs UI
  • PR #3015

The RBAC webserver UI is available on master now in airflow/www_rbac. Other features around RBAC are also being actively developed to further improve security in a multi-tenancy setup.

Note: There's also ongoing work on a new DAG-level access control (DLAC) feature in AIRFLOW-2267 which builds upon the RBAC work to introduce even more fine-grained control. More info can be found in the design doc and PR #3197.


Option 2 - Multi-tenancy with owners (simplest, available in Airflow < 1.10)

A second option you can consider for medium-grained control is a multi-tenancy setup using webserver.filter_by_owner and setting one explicit owner (a user, not a group) for each DAG. "With this, a user will see only the dags which it is owner of, unless it is a superuser."

Aside: A related feature you might be interested in running tasks as a specific user with impersonation using run_as_user or core.default_impersonation.


Option 3 - Run multiple separate Airflow instances (highest isolation)

A third option for coarse-grained control that some companies choose is to run multiple separate Airflow instances, one per team. This is probably the most practical for those looking to run multiple teams' DAGs in isolation today. If you happen to use Astronomer Enterprise, we support spinning up multiple Airflow instances.

like image 148
Taylor D. Edmiston Avatar answered Dec 25 '22 23:12

Taylor D. Edmiston