About the system
I have URLs of this format in my project:-
http://project_name/browse_by_exam/type/tutor_search/keyword/class/new_search/1/search_exam/0/search_subject/0
Where keyword/class pair means search with "class" keyword.
I have a common index.php file which executes for every module in the project. There is only a rewrite rule to remove the index.php from URL:-
RewriteCond $1 !^(index\.php|resources|robots\.txt) RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule ^(.*)$ index.php [L,QSA]
I am using urlencode() while preparing the search URL and urldecode() while reading the search URL.
Problem
Only the forward slash character is breaking URLs causing 404 page not found error. For example, if I search one/two
the URL is
http://project_name/browse_by_exam/type/tutor_search/keyword/one%2Ftwo/new_search/1/search_exam/0/search_subject/0/page_sort/
How do I fix this? I need to keep index.php hidden in the URL. Otherwise, if that was not needed, there would have been no problem with forward slash and I could have used this URL:-
http://project_name/index.php?browse_by_exam/type/tutor_search/keyword/one %2Ftwo/new_search/1/search_exam/0/search_subject/0
Experiments of Mixing Slashes And found that all those browsers accept URLs written with forward slashes ( / , the correct form), URLs written with backslashes ( \ , an incorrect form) and URLs written with mixed slashes in their path part ( / & \ , another incorrect form).
The addition of a slash at the end of a URL instructs the web server to search for a directory. This speeds the web page loading because the server will retrieve the content of the web page without wasting time searching for the file.
A space is assigned number 32, which is 20 in hexadecimal. When you see “%20,” it represents a space in an encoded URL, for example, http://www.example.com/products%20and%20services.html.
Apache denies all URLs with %2F
in the path part, for security reasons: scripts can't normally (ie. without rewriting) tell the difference between %2F
and /
due to the PATH_INFO
environment variable being automatically URL-decoded (which is stupid, but a long-standing part of the CGI specification so there's nothing can be done about it).
You can turn this feature off using the AllowEncodedSlashes
directive, but note that other web servers will still disallow it (with no option to turn that off), and that other characters may also be taboo (eg. %5C
), and that %00
in particular will always be blocked by both Apache and IIS. So if your application relied on being able to have %2F
or other characters in a path part you'd be limiting your compatibility/deployment options.
I am using urlencode() while preparing the search URL
You should use rawurlencode()
, not urlencode()
for escaping path parts. urlencode()
is misnamed, it is actually for application/x-www-form-urlencoded
data such as in the query string or the body of a POST request, and not for other parts of the URL.
The difference is that +
doesn't mean space in path parts. rawurlencode()
will correctly produce %20
instead, which will work both in form-encoded data and other parts of the URL.
Replace %2F with %252F after url encoding
PHP
function custom_http_build_query($query=array()){ return str_replace('%2F','%252F', http_build_query($query)); }
Handle the request via htaccess
.htaccess
RewriteCond %{REQUEST_URI} ^(.*?)(%252F)(.*?)$ [NC] RewriteRule . %1/%3 [R=301,L,NE]
Resources
http://www.leakon.com/archives/865
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With