Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wordpress is ignoring Unicode Chars in URL

Tags:

I am using wordpress with this type of permalink:

/%year%/%monthnum%/%postname%/

if I use this type of url: example.com/2010/03/तकनीक

it treats this url like this example.com/2010/03/ (By ignoring unicode chars) and displays March 2010 archive list.

if I use english url: example.com/2010/03/technology then it works perfectly.

This problem occurs even on tags page: for example example.com/tag/इंटरनेट is treated like example.com/tag/ and displays 404 page.

Why wordpress is ignoring unicode chars?

If I use default querystring structure then it works perfectly even with unicode characters.

Server Info: IIS7 Win2008 Server (Url rewriting enabled) Wordpress 2.9.2

like image 419
Ankur Gupta Avatar asked Mar 18 '10 22:03

Ankur Gupta


People also ask

Can a URL have Unicode characters?

URLs in HTML and JavaScriptSince HTML4, the entire Unicode character set may also be used. In HTTP , however, the range of allowed characters is expressly limited to only a subset of the US-ASCII character set (see the Character Encoding Chart for details).

Are URLs Unicode or ASCII?

URLs can only be sent over the Internet using the ASCII character-set. Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces.


2 Answers

For an overview of the problem, review:

http://ruslany.net/2010/03/important-update-for-iis-7-0-fastcgi-module/

this refers you to the now outdated:

http://ruslany.net/2010/02/fastcgi-module-differences-across-iis-versions/

My own situation was fixed by applying Win 7 SP1 but, interestingly, I was still left with applying the registry hack that is described in the Hotfix:

http://support.microsoft.com/kb/2277918

like image 151
Chris Peckham Avatar answered Sep 18 '22 18:09

Chris Peckham


I am running WAMP server on local machine. I tested $_SERVER['PATH_INFO'] on my IIS7 web server and found that it has some unicode problem. Wordpress uses path_info to handle urls. I created a file test.php with following code:

If I request http://example.com/test.php/कुछशब्द/कुछऔरशब्द/english

then I get this output

----****----

/???????/?????????/english ----****---- path_info is converting unicode hindi chars to ?????. That means there is some problem in path_info variable in my server. Do you know any setting IIS7 that can cause such type of problem?

The same code works perfectly fine on my local Apache Windows server.

like image 41
Ankur Gupta Avatar answered Sep 21 '22 18:09

Ankur Gupta