Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What other characters beside ampersand (&) should be encoded in HTML href/src attributes?

Is the ampersand the only character that should be encoded in an HTML attribute?

It's well known that this won't pass validation:

<a href="http://domain.com/search?q=whatever&lang=en"></a>

Because the ampersand should be &amp;. Here's a direct link to the validation fail.

This guy lists a bunch of characters that should be encoded, but he's wrong. If you encode the first "/" in http:// the href won't work.

In ASP.NET, is there a helper method already built to handle this? Stuff like Server.UrlEncode and HtmlEncode obviously don't work - those are for different purposes.

I can build my own simple extension method (like .ToAttributeView()) which does a simple string replace.

like image 430
sohtimsso1970 Avatar asked Sep 17 '11 16:09

sohtimsso1970


2 Answers

Other than standard URI encoding of the values, & is the only character related to HTML entities that you have to worry about simply because this is the character that begins every HTML entity. Take for example the following URL:

http://query.com/?q=foo&lt=bar&gt=baz

Even though there aren't trailing semi-colons, since &lt; is the entity for < and &gt; is the entity for >, some old browsers would translate this URL to:

http://query.com/?q=foo<=bar>=baz

So you need to specify & as &amp; to prevent this from occurring for links within an HTML parsed document.

like image 192
mVChr Avatar answered Nov 15 '22 11:11

mVChr


The purpose of escaping characters is so that they won't be processed as arguments. So you actually don't want to encode the entire url, just the values you are passing via the querystring. For example:

http://example.com/?parameter1=<ENCODED VALUE>&parameter2=<ENCODED VALUE>

The url you showed is actually a perfectly valid url that will pass validation. However, the browser will interpret the & symbols as a break between parameters in the querystring. So your querystring:

?q=whatever&lang=en

Will actually be translated by the recipient as two parameters:

q = "whatever"
lang = "en"

For your url to work you just need to ensure that your values are being encoded:

?q=<ENCODED VALUE>&lang=<ENCODED VALUE>

Edit: The common problems page from the W3C you linked to is talking about edge cases when urls are rendered in html and the & is followed by text that could be interpreted as an entity reference (&copy for example). Here is a test in jsfiddle showing the url:

http://jsfiddle.net/YjPHA/1/

In Chrome and FireFox the links works correctly, but IE renders &copy as ©, breaking the link. I have to admit I've never had a problem with this in the wild (it would only affect those entity references which don't require a semicolon, which is a pretty small subset).

To ensure you're safe from this bug you can HTML encode any of your URLS you render to the page and you should be fine. If you're using ASP.NET the HttpUtility.HtmlEncode method should work just fine.

like image 28
Chris Van Opstal Avatar answered Nov 15 '22 11:11

Chris Van Opstal