Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are chinese characters allowed entered in URLs?

Tags:

url

php

cjk

Are chinese characters allowed to be entered in URLs?

As tested, chinese characters are able to be entered in URLs, and it will convert to punycode as well and send out the request as well too, and reach to the related page.

But for currently, is there anybody else will do validation for website URLs to be allowed chinese character as well?

like image 875
deepWebMie Avatar asked Aug 25 '11 03:08

deepWebMie


People also ask

Can a URL contain Chinese characters?

The Chinese language can pose a problem with URLs as Chinese characters are not recommended for URL structures.

Can domain names be Chinese?

中国 domain names are Chinese character top-level domain names representing China on the Internet, and same as the . CN domain names, the . 中国 domain names are the component of Chinese domain name system and the global Internet domain name system, featuring universality and uniqueness around the world.

Is Chinese in Unicode?

The Unicode Standard contains a set of unified Han ideographic characters used in the written Chinese, Japanese, and Korean languages. The term Han, derived from the Chi- nese Han Dynasty, refers generally to Chinese traditional culture.


2 Answers

Punycode exists to be able to use non-Latin scripts in non-supported software. So whilst I like my site http://見.香港/ I can enter http://xn--nw2a.xn--j6w193g/ if I cannot enter the Unicode original form.

Some website developers program overly defensively, for example with Google Apps you cannot use punycode domains at all due to aggressive white listing that has not updated with ICANN standards.

UPDATE: Stackoverflow now supports Unicode domain names and thus comments below are outdated. The unusual domain name is the punycode, i.e. encoded, version of Unicode for systems that do not directly support Unicode.

xn--nw2a = 見
xn--j6w193g = 香港

As of 2022/1/1, Stackoverflow has a feature that interprets punycode domains as their Unicode form in preview, but not when saved. This is not really appropriate for a code platform which may be discussing punycode, but would be fine for other sites in the exchange.

Screenshot of preview function in stackoverflow:

Screenshot of stackoverflow edit preview with punycode domain

like image 166
Steve-o Avatar answered Oct 11 '22 02:10

Steve-o


All non-ascii characters that presents in domain name will (should) be converted to puny-code. It is browser's business to display it as a hieroglyphs

like image 26
zerkms Avatar answered Oct 11 '22 02:10

zerkms