Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What explains Firefox and Safari's different treatment of user-supplied URIs containing more than one # symbol? Which is 'right'?

In Firefox 4.0.1 paste the following into the address bar

http://www.w3.org/#one#two

Note that the browser navigates to the w3.org front page and the address bar still reads

http://www.w3.org/#one#two

In Safari 5.0.4 do the same. Note the browser also navigates, but the address bar text is modified to read

http://www.w3.org/#one%23two

Note the first hash appearance of hash in the string is not altered but the second is modified to the encoded form (aka 'escaped') %23.

It seems reasonable to assume that Safari is trying to convert the user-supplied URI to a link that meets its idea of a valid URI. Firefox does not make a conversion in this case.

I would like to account for the difference in behavior.

The document at http://www.ecma-international.org/publications/standards/Ecma-262.htm is one reference to what form a valid URI takes. In section 15.1.3.1 it states the following with respect to unescaping of URIs by browsers.

The character “#” is not decoded from escape sequences even though it is not a reserved URI character.

What it this arguably implies is that it refers to # symbols throughout the URI string, not just the first occurrence.

In conclusion, my question is:

  • Do both forms of the link meet the latest standard for valid URIs?
  • If they are both valid, which browser behavior is most appropriate?
like image 388
Jim Blackler Avatar asked May 05 '11 11:05

Jim Blackler


1 Answers

RfC 3986 (the definition of what URIs and thus URLs look like and what the parts mean) does not allow two # characters in one URL, at least in my reading. Which makes the question boil down to:

  • Is it better to forward the user error to the web application (where the designer might have made the same mistake),
  • or is it better to transform the user input into something closely-related, but valid?

Also note that the RfC clearly lists # as a reserved character, so the ECMA standard is wrong in what you quoted above.

like image 195
Christopher Creutzig Avatar answered Oct 02 '22 16:10

Christopher Creutzig