Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

urlencoded Forward slash is breaking URL

About the system

I have URLs of this format in my project:-

http://project_name/browse_by_exam/type/tutor_search/keyword/class/new_search/1/search_exam/0/search_subject/0 

Where keyword/class pair means search with "class" keyword.

I have a common index.php file which executes for every module in the project. There is only a rewrite rule to remove the index.php from URL:-

RewriteCond $1 !^(index\.php|resources|robots\.txt) RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule ^(.*)$ index.php [L,QSA] 

I am using urlencode() while preparing the search URL and urldecode() while reading the search URL.

Problem

Only the forward slash character is breaking URLs causing 404 page not found error. For example, if I search one/two the URL is

http://project_name/browse_by_exam/type/tutor_search/keyword/one%2Ftwo/new_search/1/search_exam/0/search_subject/0/page_sort/ 

How do I fix this? I need to keep index.php hidden in the URL. Otherwise, if that was not needed, there would have been no problem with forward slash and I could have used this URL:-

http://project_name/index.php?browse_by_exam/type/tutor_search/keyword/one %2Ftwo/new_search/1/search_exam/0/search_subject/0 
like image 380
Sandeepan Nath Avatar asked Jul 13 '10 08:07

Sandeepan Nath


People also ask

Is forward slash allowed in URL?

Experiments of Mixing Slashes And found that all those browsers accept URLs written with forward slashes ( / , the correct form), URLs written with backslashes ( \ , an incorrect form) and URLs written with mixed slashes in their path part ( / & \ , another incorrect form).

How do you handle a slash in a URL?

The addition of a slash at the end of a URL instructs the web server to search for a directory. This speeds the web page loading because the server will retrieve the content of the web page without wasting time searching for the file.

What is a %20 in a URL?

A space is assigned number 32, which is 20 in hexadecimal. When you see “%20,” it represents a space in an encoded URL, for example, http://www.example.com/products%20and%20services.html.


2 Answers

Apache denies all URLs with %2F in the path part, for security reasons: scripts can't normally (ie. without rewriting) tell the difference between %2F and / due to the PATH_INFO environment variable being automatically URL-decoded (which is stupid, but a long-standing part of the CGI specification so there's nothing can be done about it).

You can turn this feature off using the AllowEncodedSlashes directive, but note that other web servers will still disallow it (with no option to turn that off), and that other characters may also be taboo (eg. %5C), and that %00 in particular will always be blocked by both Apache and IIS. So if your application relied on being able to have %2F or other characters in a path part you'd be limiting your compatibility/deployment options.

I am using urlencode() while preparing the search URL

You should use rawurlencode(), not urlencode() for escaping path parts. urlencode() is misnamed, it is actually for application/x-www-form-urlencoded data such as in the query string or the body of a POST request, and not for other parts of the URL.

The difference is that + doesn't mean space in path parts. rawurlencode() will correctly produce %20 instead, which will work both in form-encoded data and other parts of the URL.

like image 113
bobince Avatar answered Oct 02 '22 13:10

bobince


Replace %2F with %252F after url encoding

PHP

function custom_http_build_query($query=array()){      return str_replace('%2F','%252F', http_build_query($query)); } 

Handle the request via htaccess

.htaccess

RewriteCond %{REQUEST_URI} ^(.*?)(%252F)(.*?)$ [NC] RewriteRule . %1/%3 [R=301,L,NE] 

Resources

http://www.leakon.com/archives/865

like image 29
RafaSashi Avatar answered Oct 02 '22 11:10

RafaSashi