Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strategies for dealing with URIs when building an application that sits behind a reverse proxy

Tags:

http

proxy

apache

I'm building an application with a self-contained HTTP server which can be either accessed directly, or put behind a reverse proxy (like Apache mod_proxy).

So, let's say my application is running on port 8080 and you set up your Apache like this:

ProxyPass /myapp http://localhost:8080
ProxyPassReverse /myapp http://localhost:8080

This will cause HTTP requests coming into the main Apache server that go to /myapp/* to be proxied to my application. If a request comes in like GET /myapp/bar, my application will see GET /bar. This is as it should be.

The problem that arises is in generating URIs that have to be translated from my application's URI-space in order to work correctly via the proxy (i.e. prepending /myapp/).

The ProxyPassReverse directive takes care of handling this for URIs in HTTP headers (redirects and so forth.) But that doesn't handle URIs in the HTML generated by my application, or in static files and templates.

I'm aware of filters like mod_proxy_html, but this is a non-standard Apache module, and in any case, such filters may not be available for other front-end web servers which are capable of acting as a reverse proxy.

So I've come up with a few possible strategies:

  1. Require an environment variable be set somewhere that contains the proxy path, and prepend this to all generated URIs. This seems inelegant; it breaks the encapsulation provided by the reverse proxy.

  2. Put the proxy path in a configuration file for my application. Same objection as above.

  3. Use only relative URIs in my application. This can get somewhat tricky; I would have to calculate the path difference between the current resource and where the link is going and add the appropriate number of ../'es. Seems messy. Another problem is that some things must generate absolute URIs, like RSS feeds and generated emails.

  4. Use some hacky Javascript on the front-end to mungle URIs in the document text. This seems like a really horrible idea from an interoperability standpoint.

  5. Use a singe URI-generating function throughout my code, and require "static" files like Javascript, CSS, etc. to be run through my templating system. This is the idea I'm leaning towards now.

This must be a fairly common problem. How have you approached it in the past? What has worked and what has made things more difficult?

like image 677
friedo Avatar asked Dec 17 '09 17:12

friedo


People also ask

What is Application reverse proxy?

A reverse proxy server is a type of proxy server that typically sits behind the firewall in a private network and directs client requests to the appropriate backend server. A reverse proxy provides an additional level of abstraction and control to ensure the smooth flow of network traffic between clients and servers.

What is https reverse proxy?

A reverse proxy is a server that sits in front of web servers and forwards client (e.g. web browser) requests to those web servers. Reverse proxies are typically implemented to help increase security, performance, and reliability.

How does reverse proxy work?

A reverse proxy accepts a request from a client, forwards it to a server that can fulfill it, and returns the server's response to the client. A load balancer distributes incoming client requests among a group of servers, in each case returning the response from the selected server to the appropriate client.


1 Answers

Yep, common problem. How to solve this depends on the kind of app you have and the server platform and web framework you're working with. But there's a general way I've approached these problems which has worked pretty well so far.

My preference is to handle problems like this in application code, rather than relying on web server modules like mod_proxy_html to do it, because there are often too many special cases (e.g. client-side-javascript assembling URLs on the fly) which the server module doesn't catch. That said, I've resorted to the server-module approach in a few cases, but I decided to revise the module code myself to handle the corner cases. Also keep perormance in mind; fixing up URLs in your code at the time they're generated is usually faster than shoving the entire HTML through another server module.

Here's my recommendation of how to handle this in your code:

First, you'll need to figure out what kind of URLs to generate. My preference is for relative URLs. You are correct above that "add the appropriate number of ../'es" is messy, but at least it's your (the programmer's) mess. If you go with the config-file/environment-variable approach, then you'll be dependent on whoever deploys your app (e.g. an underpaid and grumpy IT operations engineer) to always set things up correctly. It also complicates release of your code, even if you're doing deployment yourself, since you can't simply copy your development files into production but need to add a per-deployment-environment custom step. I've found in the past that eliminating potential deployment problems is worth a lot of pre-emptive coding.

Next, you'll need to get those URLs into your code. How you do this varies based on type of content/code:

For server-side code (e.g. PHP, RoR, etc.) you'll want to make sure that server-side URL generation happens in as few places as possible in your code (ideally, one method!). If you're using any of the mainstream MVC web frameworks (e.g. RoR, Django, etc.), this should be trivial since URL generation using an MVC framework already generally goes through a single codepath that you can override. If you're not using one of those frameworks, you likely have URL generation littered throughout your code. But the approach you'll want to take is to generate all URLs via code, and then override that method to support transforming non-relative URLs into relative URLs. You can usually search for patterns in your code (like "/, '/, "http://, 'http://) and do a manual search and replace (or if you're really nerdy and have more patience than I do, craft a regex to replace each common case in your source code).

The key to making this work reliably is that, instead of manually replacing all absolute URLs with relative ones in your server-side code (which, even if you get each of them right, is fragile if files are moved), you can leave the absolute URLs in place and simply wrap them with a call to your "relativizer" method. This is much more reliable and unbrittle.

For Javascript, I generally like to do the same thing as server code-- move all URL generation into a single method and ensure any URL generation calls this method. This can be hard on an app with lots of pre-existing javascript, but the search-and-replace method above seems to work well in JS too.

For CSS, URLs in CSS are relative to the location of the CSS file (not the calling HTML page) so using relative URLs is generally easy. Simply put your CSS into a folder and either put images into deeper folders beneath it, or put images into a parallel folder to your CSS and use a single ../ to get to the images relatively. This is a good best practice in general-- if you're not doing relative URLs in CSS already, you should consider doing it, regardless of reverse proxy.

Finally, you'll need to figure out what to do for other oddball static files (like legacy static HTML files sometimes creep in). In general, I recommend the same practice as CSS and images-- ideally, you'd put static files into predictable directories and rely on relative URLs. Or (depending on your server platform) it may be easier to remap the file extensions of those static files so that they're processed by your web framework-- and then run your server-side URL generator for all URLs. Or, barring that, you can leave the files in place and manually fix up URLs to be relative-- knowing that this is brittle.

Coming full circle, sometimes there are just too many places where URLs are generated, and it's more effective to use a server module like mod_proxy_html. But I consider this a last resort-- especially if you won't be comfortable editing the source code if needed.

BTW, I realize I didn't mention anyting about your idea #4 above (javascript-link-fixup). I wouldn't do that-- if the user has javascript turned off or (more common) some network problem prevents that javascript for some time after the rest of the page loads, then your links won't work. Too risky.

like image 126
Justin Grant Avatar answered Oct 12 '22 09:10

Justin Grant