I'm building an application with a self-contained HTTP server which can be either accessed directly, or put behind a reverse proxy (like Apache <code>mod_proxy</code>). So, let's say my application is running on port 8080 and you set up your Apache like this: <pre class="prettyprint"><code>ProxyPass /myapp http://localhost:8080 ProxyPassReverse /myapp http://localhost:8080 </code></pre> This will cause HTTP requests coming into the main Apache server that go to <code>/myapp/*</code> to be proxied to my application. If a request comes in like <code>GET /myapp/bar</code>, my application will see <code>GET /bar</code>. This is as it should be. The problem that arises is in generating URIs that have to be translated from my application's URI-space in order to work correctly via the proxy (i.e. prepending <code>/myapp/</code>). The <code>ProxyPassReverse</code> directive takes care of handling this for URIs in HTTP headers (redirects and so forth.) But that doesn't handle URIs in the HTML generated by my application, or in static files and templates. I'm aware of filters like <code>mod_proxy_html</code>, but this is a non-standard Apache module, and in any case, such filters may not be available for other front-end web servers which are capable of acting as a reverse proxy. So I've come up with a few possible strategies: <ol> <li>Require an environment variable be set somewhere that contains the proxy path, and prepend this to all generated URIs. This seems inelegant; it breaks the encapsulation provided by the reverse proxy.</li> <li>Put the proxy path in a configuration file for my application. Same objection as above.</li> <li>Use only relative URIs in my application. This can get somewhat tricky; I would have to calculate the path difference between the current resource and where the link is going and add the appropriate number of <code>../</code>'es. Seems messy. Another problem is that some things must generate absolute URIs, like RSS feeds and generated emails.</li> <li>Use some hacky Javascript on the front-end to mungle URIs in the document text. This seems like a really horrible idea from an interoperability standpoint.</li> <li>Use a singe URI-generating function throughout my code, and require "static" files like Javascript, CSS, etc. to be run through my templating system. This is the idea I'm leaning towards now.</li> </ol> This must be a fairly common problem. How have you approached it in the past? What has worked and what has made things more difficult?

Yep, common problem. How to solve this depends on the kind of app you have and the server platform and web framework you're working with. But there's a general way I've approached these problems which has worked pretty well so far. My preference is to handle problems like this in application code, rather than relying on web server modules like mod_proxy_html to do it, because there are often too many special cases (e.g. client-side-javascript assembling URLs on the fly) which the server module doesn't catch. That said, I've resorted to the server-module approach in a few cases, but I decided to revise the module code myself to handle the corner cases. Also keep perormance in mind; fixing up URLs in your code at the time they're generated is usually faster than shoving the entire HTML through another server module. Here's my recommendation of how to handle this in your code: First, you'll need to figure out what kind of URLs to generate. My preference is for relative URLs. You are correct above that "add the appropriate number of ../'es" is messy, but at least it's your (the programmer's) mess. If you go with the config-file/environment-variable approach, then you'll be dependent on whoever deploys your app (e.g. an underpaid and grumpy IT operations engineer) to always set things up correctly. It also complicates release of your code, even if you're doing deployment yourself, since you can't simply copy your development files into production but need to add a per-deployment-environment custom step. I've found in the past that eliminating potential deployment problems is worth a lot of pre-emptive coding. Next, you'll need to get those URLs into your code. How you do this varies based on type of content/code: For server-side code (e.g. PHP, RoR, etc.) you'll want to make sure that server-side URL generation happens in as few places as possible in your code (ideally, one method!). If you're using any of the mainstream MVC web frameworks (e.g. RoR, Django, etc.), this should be trivial since URL generation using an MVC framework already generally goes through a single codepath that you can override. If you're not using one of those frameworks, you likely have URL generation littered throughout your code. But the approach you'll want to take is to generate all URLs via code, and then override that method to support transforming non-relative URLs into relative URLs. You can usually search for patterns in your code (like <code>"/</code>, <code>'/</code>, <code>"http://</code>, <code>'http://</code>) and do a manual search and replace (or if you're really nerdy and have more patience than I do, craft a regex to replace each common case in your source code). The key to making this work reliably is that, instead of manually replacing all absolute URLs with relative ones in your server-side code (which, even if you get each of them right, is fragile if files are moved), you can leave the absolute URLs in place and simply wrap them with a call to your "relativizer" method. This is much more reliable and unbrittle. For Javascript, I generally like to do the same thing as server code-- move all URL generation into a single method and ensure any URL generation calls this method. This can be hard on an app with lots of pre-existing javascript, but the search-and-replace method above seems to work well in JS too. For CSS, URLs in CSS are relative to the location of the CSS file (not the calling HTML page) so using relative URLs is generally easy. Simply put your CSS into a folder and either put images into deeper folders beneath it, or put images into a parallel folder to your CSS and use a single ../ to get to the images relatively. This is a good best practice in general-- if you're not doing relative URLs in CSS already, you should consider doing it, regardless of reverse proxy. Finally, you'll need to figure out what to do for other oddball static files (like legacy static HTML files sometimes creep in). In general, I recommend the same practice as CSS and images-- ideally, you'd put static files into predictable directories and rely on relative URLs. Or (depending on your server platform) it may be easier to remap the file extensions of those static files so that they're processed by your web framework-- and then run your server-side URL generator for all URLs. Or, barring that, you can leave the files in place and manually fix up URLs to be relative-- knowing that this is brittle. Coming full circle, sometimes there are just too many places where URLs are generated, and it's more effective to use a server module like mod_proxy_html. But I consider this a last resort-- especially if you won't be comfortable editing the source code if needed. BTW, I realize I didn't mention anyting about your idea #4 above (javascript-link-fixup). I wouldn't do that-- if the user has javascript turned off or (more common) some network problem prevents that javascript for some time after the rest of the page loads, then your links won't work. Too risky.

Strategies for dealing with URIs when building an application that sits behind a reverse proxy

Tags:

http

proxy

apache

I'm building an application with a self-contained HTTP server which can be either accessed directly, or put behind a reverse proxy (like Apache mod_proxy).

So, let's say my application is running on port 8080 and you set up your Apache like this:

ProxyPass /myapp http://localhost:8080
ProxyPassReverse /myapp http://localhost:8080

This will cause HTTP requests coming into the main Apache server that go to /myapp/* to be proxied to my application. If a request comes in like GET /myapp/bar, my application will see GET /bar. This is as it should be.

The problem that arises is in generating URIs that have to be translated from my application's URI-space in order to work correctly via the proxy (i.e. prepending /myapp/).

The ProxyPassReverse directive takes care of handling this for URIs in HTTP headers (redirects and so forth.) But that doesn't handle URIs in the HTML generated by my application, or in static files and templates.

I'm aware of filters like mod_proxy_html, but this is a non-standard Apache module, and in any case, such filters may not be available for other front-end web servers which are capable of acting as a reverse proxy.

So I've come up with a few possible strategies:

Require an environment variable be set somewhere that contains the proxy path, and prepend this to all generated URIs. This seems inelegant; it breaks the encapsulation provided by the reverse proxy.
Put the proxy path in a configuration file for my application. Same objection as above.
Use only relative URIs in my application. This can get somewhat tricky; I would have to calculate the path difference between the current resource and where the link is going and add the appropriate number of ../'es. Seems messy. Another problem is that some things must generate absolute URIs, like RSS feeds and generated emails.
Use some hacky Javascript on the front-end to mungle URIs in the document text. This seems like a really horrible idea from an interoperability standpoint.
Use a singe URI-generating function throughout my code, and require "static" files like Javascript, CSS, etc. to be run through my templating system. This is the idea I'm leaning towards now.

This must be a fairly common problem. How have you approached it in the past? What has worked and what has made things more difficult?

677

asked Dec 17 '09 17:12

friedo

1 Answers

Yep, common problem. How to solve this depends on the kind of app you have and the server platform and web framework you're working with. But there's a general way I've approached these problems which has worked pretty well so far.

My preference is to handle problems like this in application code, rather than relying on web server modules like mod_proxy_html to do it, because there are often too many special cases (e.g. client-side-javascript assembling URLs on the fly) which the server module doesn't catch. That said, I've resorted to the server-module approach in a few cases, but I decided to revise the module code myself to handle the corner cases. Also keep perormance in mind; fixing up URLs in your code at the time they're generated is usually faster than shoving the entire HTML through another server module.

Here's my recommendation of how to handle this in your code:

First, you'll need to figure out what kind of URLs to generate. My preference is for relative URLs. You are correct above that "add the appropriate number of ../'es" is messy, but at least it's your (the programmer's) mess. If you go with the config-file/environment-variable approach, then you'll be dependent on whoever deploys your app (e.g. an underpaid and grumpy IT operations engineer) to always set things up correctly. It also complicates release of your code, even if you're doing deployment yourself, since you can't simply copy your development files into production but need to add a per-deployment-environment custom step. I've found in the past that eliminating potential deployment problems is worth a lot of pre-emptive coding.

Next, you'll need to get those URLs into your code. How you do this varies based on type of content/code:

For server-side code (e.g. PHP, RoR, etc.) you'll want to make sure that server-side URL generation happens in as few places as possible in your code (ideally, one method!). If you're using any of the mainstream MVC web frameworks (e.g. RoR, Django, etc.), this should be trivial since URL generation using an MVC framework already generally goes through a single codepath that you can override. If you're not using one of those frameworks, you likely have URL generation littered throughout your code. But the approach you'll want to take is to generate all URLs via code, and then override that method to support transforming non-relative URLs into relative URLs. You can usually search for patterns in your code (like "/, '/, "http://, 'http://) and do a manual search and replace (or if you're really nerdy and have more patience than I do, craft a regex to replace each common case in your source code).

The key to making this work reliably is that, instead of manually replacing all absolute URLs with relative ones in your server-side code (which, even if you get each of them right, is fragile if files are moved), you can leave the absolute URLs in place and simply wrap them with a call to your "relativizer" method. This is much more reliable and unbrittle.

For Javascript, I generally like to do the same thing as server code-- move all URL generation into a single method and ensure any URL generation calls this method. This can be hard on an app with lots of pre-existing javascript, but the search-and-replace method above seems to work well in JS too.

For CSS, URLs in CSS are relative to the location of the CSS file (not the calling HTML page) so using relative URLs is generally easy. Simply put your CSS into a folder and either put images into deeper folders beneath it, or put images into a parallel folder to your CSS and use a single ../ to get to the images relatively. This is a good best practice in general-- if you're not doing relative URLs in CSS already, you should consider doing it, regardless of reverse proxy.

Finally, you'll need to figure out what to do for other oddball static files (like legacy static HTML files sometimes creep in). In general, I recommend the same practice as CSS and images-- ideally, you'd put static files into predictable directories and rely on relative URLs. Or (depending on your server platform) it may be easier to remap the file extensions of those static files so that they're processed by your web framework-- and then run your server-side URL generator for all URLs. Or, barring that, you can leave the files in place and manually fix up URLs to be relative-- knowing that this is brittle.

Coming full circle, sometimes there are just too many places where URLs are generated, and it's more effective to use a server module like mod_proxy_html. But I consider this a last resort-- especially if you won't be comfortable editing the source code if needed.

BTW, I realize I didn't mention anyting about your idea #4 above (javascript-link-fixup). I wouldn't do that-- if the user has javascript turned off or (more common) some network problem prevents that javascript for some time after the rest of the page loads, then your links won't work. Too risky.

126

answered Oct 12 '22 09:10

Justin Grant

Related questions
                            
                                Mono 2.11 with nginx or apache
                            
                                How to stream video in a django app
                            
                                Forwarding apache request to a c++ program
                            
                                Cant get localhost to show WampServer index and dynamic urls at same time
                            
                                How do I create the project outside the /var/www/ directory?
                            
                                What's $0 in nginx? (mod_rewrite)
                            
                                htaccess redirect url with parameters
                            
                                How to solve svn: E195019: Redirect cycle detected, in my case?
                            
                                How to server static files + proxy context
                            
                                apache2 with mod_mono cannot be started. Error: undefined symbol: unixd_config
                            
                                PHP Fatal Error: Class 'MongoClient' not found
                            
                                Web Frameworks versus Web Servers? [closed]
                            
                                What should I note when I want to change Apache License 2.0 code?
                            
                                Apache 2.4.23 undefined reference to CRYPTO_malloc_init?
                            
                                PHP/Ubuntu - QxcbConnection: Could not connect to display aborted
                            
                                How to deploy a Laravel Web Application on Alpine Linux using Docker?
                            
                                Dynamic IP-based blacklisting
                            
                                How to configure Apache to work as proxy (load balancer) for j2ee server?
                            
                                Long-lived connections (asynchronous server push) with Apache/PHP/Javascript?
                            
                                Performance effect of enabling apache response time log directive

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With