Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is a slash ("/") equivalent to an encoded slash ("%2F") in the path portion of an HTTP URL

Tags:

http

url

encoding

I have a site that treats "/" and "%2F" in the path portion (not the query string) of a URL differently. Is this a bad thing to do according to either the RFC or the real world?

I ask because I keep running into little surprises with the web framework I'm using (Ruby on Rails) as well as the layers below that (Passenger, Apache, e.g., I had to enable "ALLOW_ENCODED_SLASHES" for Apache). I am now leaning toward getting rid of the encoded slashes completely, but I wonder if I should be filing bug reports where I see weird behavior involving the encoded slashes.

As to why I have the encoded slashes in the first place, basically I have routes such as this:

:controller/:foo/:bar 

where :foo is something like a path that can contain slashes. I thought the most straightforward thing to do would be to just URL escape foo so the slashes are ignored by the routing mechanism. Now I am having doubts, and it's pretty clear that the frameworks don't really support this, but according to the RFC is it wrong to do it this way?

Here is some information I have gathered:

RFC 1738 (URLs):

Usually a URL has the same interpretation when an octet is represented by a character and when it encoded. However, this is not true for reserved characters: encoding a character reserved for a particular scheme may change the semantics of a URL.

RFC 2396 (URIs):

These characters are called "reserved", since their usage within the URI component is limited to their reserved purpose. If the data for a URI component would conflict with the reserved purpose, then the conflicting data must be escaped before forming the URI.

(does escaping here mean something other than encoding the reserved character?)

RFC 2616 (HTTP/1.1):

Characters other than those in the "reserved" and "unsafe" sets (see RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.

There is also this bug report for Rails, where they seem to expect the encoded slash to behave differently:

Right, I'd expect different results because they're pointing at different resources.

It's looking for the literal file 'foo/bar' in the root directory. The non escaped version is looking for the file bar within directory foo.

It's clear from the RFCs that raw vs. encoded is the equivalent for unreserved characters, but what is the story for reserved characters?

like image 392
user85509 Avatar asked Dec 24 '09 06:12

user85509


People also ask

What is a slash in a URL?

The addition of a slash at the end of a URL instructs the web server to search for a directory. This speeds the web page loading because the server will retrieve the content of the web page without wasting time searching for the file.

Is %2F a slash?

Encoded forward slash (%2F) in parameter not routing correctly #22125.

What is %20 in the URL?

A space is assigned number 32, which is 20 in hexadecimal. When you see “%20,” it represents a space in an encoded URL, for example, http://www.example.com/products%20and%20services.html.


2 Answers

From the data you gathered, I would tend to say that encoded "/" in an uri are meant to be seen as "/" again at application/cgi level.

That's to say, that if you're using apache with mod_rewrite for instance, it will not match pattern expecting slashes against URI with encoded slashes in it. However, once the appropriate module/cgi/... is called to handle the request, it's up to it to do the decoding and, for instance, retrieve a parameter including slashes as the first component of the URI.

If your application is then using this data to retrieve a file (whose filename contains a slash), that's probably a bad thing.

To sum up, I find it perfectly normal to see a difference of behaviour in "/" or "%2F" as their interpretation will be done at different levels.

like image 190
Zeograd Avatar answered Sep 20 '22 16:09

Zeograd


The story of %2F vs / was that, according to the initial W3C recommendations, slashes «must imply a hierarchical structure»:

The slash ("/", ASCII 2F hex) character is reserved for the delimiting of substrings whose relationship is hierarchical. This enables partial forms of the URI.

Example 2

The URIs

http://www.w3.org/albert/bertram/marie-claude

and

http://www.w3.org/albert/bertram%2Fmarie-claude

are NOT identical, as in the second case the encoded slash does not have hierarchical significance.

like image 31
Yury Kirienko Avatar answered Sep 20 '22 16:09

Yury Kirienko