Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should fragment part of a URL be escaped?

If a URL has unusual characters in the fragment part (i.e. after #) how should they be (percent) escaped? I can't find a consistent answer in how browsers handle this, which is probably a good reason not to have them, but I'd like to know what the 'right' answer is.

My testing seems to suggest not to escape at all, but that this is only reliable when following links, not when pasting into a browser's address bar.

I wrote a little web page as appended. I then pasted the following link into various browsers. The 'go' link in the page is there to see what happens when you click a link as opposed to pasting it (which seems to differ in some browsers)

http://www.frankieandshadow.com/test.html/?new=1#{# &}%7B%23%20%26%7D

(I notice stackoverlow's pattern match for URLs doesn't like that - I intend the whole line; again there may be a clue for me there!)

Chrome appears to do no unescaping of any kind, and produces consistently:

#{# &}%7B%23%20%26%7D

Firefox substitutes some, but not all, of the escaped characters pasted with their non escaped equivalents, and then produces

#{# &}{# &}

and this is the same if you follow the link

Safari (on PC) does the opposite: it encodes non-encoded unusual characters on paste, and then produces

#%7B%23%20&%7D%7B%23%20%26%7D

but following the link is different, producing

#{# &}%7B%23%20%26%7D

IE9, amazingly, behaves just like Chrome

IE7 replaces the real space with %20 on paste but otherwise leaves the URL alone, and produces

#{#%20&}%7B%23%20%26%7D

and if you click the link, it gives

#{# &}%7B%23%20%26%7D


<html>
<head>
<title>test</title>
<script type="text/javascript">
function wibble() {
  document.getElementById("wobble").innerHTML = 
    location.hash.replace(/&/g,"&amp;").replace(/</g,"&lt;").replace(/>/g,"&gt;");
}
</script>
</head>
<body onload='wibble()'>
<div id='wobble'></div>
<a href='/test.html?new=1#{# &}%7B%23%20%26%7D'>go</a>
</body>
</html>
like image 524
frankieandshadow Avatar asked Dec 13 '12 13:12

frankieandshadow


People also ask

What is the fragment portion of a URL?

3 Answers. Show activity on this post. A fragment is an internal page reference, sometimes called a named anchor. It usually appears at the end of a URL and begins with a hash (#) character followed by an identifier.

What is the use of fragment in URL?

The fragment identifier introduced by a hash mark # is the optional last part of a URL for a document. It is typically used to identify a portion of that document. The generic syntax is specified in RFC 3986. The hash-mark separator in URIs is not part of the fragment identifier.

Are URL fragments sent to server?

Fragment identifiers are not sent to the server. The hash fragment is used by the browser to link to elements within the same page.

Can a URL have more than one fragment?

A URL cannot have more than one fragment. URL parameters are passed in key-value pairs. URL fragments comprise just a string of text after the hash (#).


1 Answers

The ABNF in RFC3986 says that fragments are made up of pchars - i.e. they are percent encoded.

Which is to say that characters in fragment identifiers may be any alphanumeric or one of

-._~!$&'()*+,;=:@

All other characters should be percent encoded.

like image 97
kybernetikos Avatar answered Oct 19 '22 05:10

kybernetikos