Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cannot access airflow web server via AWS load balancer HTTPS because airflow redirects me to HTTP

I have an airflow web server configured at EC2, it listens at port 8080.

I have an AWS ALB(application load balancer) in front of the EC2, listen at https 80 (facing internet) and instance target port is facing http 8080.

I cannot surf https://< airflow link > from browser because the airflow web server redirects me to http : //< airflow link >/admin, which the ALB does not listen at.

If I surf https://< airflow link > /admin/airflow/login?next=%2Fadmin%2F from browser, then I see the login page because this link does not redirect me.

My question is how to change airflow so that when surfing https://< airflow link > , airflow web server will redirect me to https:..., not http://..... so that AWS ALB can process the request.

I tried to change base_url of airflow.cfg from http://localhost:8080 to https://localhost:8080, according to the below answer, but I do not see any difference with my change....

Anyway, how to access https://< airflow link > from ALB?

like image 389
user389955 Avatar asked Jan 24 '18 00:01

user389955


4 Answers

Since they're using Gunicorn - you can configure the forwarded_allow_ips value as an evironment variable instead of having to use an intermediary proxy like Nginx.

In my case I just set FORWARDED_ALLOW_IPS = * and it's working perfectly fine.

In ECS you can set this in the webserver task configuration if you're using one docker image for all the Airflow tasks (webserver, scheduler, worker, etc.).

like image 109
Nathan Clayton Avatar answered Nov 09 '22 16:11

Nathan Clayton


User user389955 own solution is probably the best approach, but for anyone looking for a quick fix (or want a better idea on what is going on), this seems to be the culprit.

In the following file (python distro may differ):

/usr/local/lib/python3.5/dist-packages/gunicorn/config.py

The following section prevents forwarded for headers from anything other than local

class ForwardedAllowIPS(Setting):
    name = "forwarded_allow_ips"
    section = "Server Mechanics"
    cli = ["--forwarded-allow-ips"]
    meta = "STRING"
    validator = validate_string_to_list
    default = os.environ.get("FORWARDED_ALLOW_IPS", "127.0.0.1")
    desc = """\
        Front-end's IPs from which allowed to handle set secure headers.
        (comma separate).

        Set to ``*`` to disable checking of Front-end IPs (useful for setups
        where you don't know in advance the IP address of Front-end, but
        you still trust the environment).

        By default, the value of the ``FORWARDED_ALLOW_IPS`` environment
        variable. If it is not defined, the default is ``"127.0.0.1"``.
        """

Changing from 127.0.0.1 to specific IP's or * if IP's unknown will do the trick.

At this point, I haven't found a way to set this parameter from within airflow config itself. If I find a way, will update my answer.

like image 21
Doug Avatar answered Nov 09 '22 17:11

Doug


Finally I found a solution myself.

I introduced a nginx reverse proxy between ALB and airflow web server: ie. https request ->ALB:443 ->nginx proxy: 80 ->web server:8080. I make the nginx proxy tell the airflow web server that the original request is https not http by adding a http header "X-Forwarded-Proto https".

The nginx server is co-located with the web server. and I set the config of it as /etc/nginx/sites-enabled/vhost1.conf (see below). Besides, I deletes the /etc/nginx/sites-enabled/default config file.

server {
    listen 80;
    server_name <domain>;
    index index.html index.htm;
    location / {
      proxy_pass_header Authorization;
      proxy_pass http://localhost:8080;
      proxy_set_header Host $host;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto https;
      proxy_http_version 1.1;
      proxy_redirect off;
      proxy_set_header Connection "";
      proxy_buffering off;
      client_max_body_size 0;
      proxy_read_timeout 36000s;
    }
}
like image 2
user389955 Avatar answered Nov 09 '22 15:11

user389955


We solved this problem in my team by adding an HTTP listener to our ALB that redirects all HTTP traffic to HTTPS (so we have an HTTP listener AND an HTTPS listener). Our Airflow webserver tasks still listen on port 80 for HTTP traffic, but this HTTP traffic is only in our VPC so we don't care. The connection from browser to the load balancer is always HTTPS or HTTP that gets rerouted to HTTPS and that's what matters.

Here is the Terraform code for the new listener:

resource "aws_lb_listener" "alb_http" {
  load_balancer_arn = aws_lb.lb.arn
  port              = 80
  protocol          = "HTTP"
  default_action {
    type = "redirect"
    redirect {
      port        = "443"
      protocol    = "HTTPS"
      status_code = "HTTP_301"
    }
  }
}

Or if you're an AWS console kinda place here's how you set up the default action for the listener:

Console

like image 1
GDubz Avatar answered Nov 09 '22 16:11

GDubz