Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a path in a URI contain unicode?

Tags:

url

uri

unicode

Is it possible for a valid URL to contain non-escaped Unicode characters?

like image 821
Matty Avatar asked Jul 29 '11 09:07

Matty


People also ask

Is Unicode allowed in URL?

Q: What is an Internationalized Domain Name (IDN)? Domain names, such as "macchiati.blogspot.com", were originally designed only to support ASCII characters. In 2003, the first specification was released that allows most Unicode characters to be used in domain names.

Can URLs have UTF 8 characters?

Building a valid URL By the same token, any code that generates or accepts UTF-8 input might treat URLs with UTF-8 characters as "valid", but would also need to translate those characters before sending them out to a web server. This process is called URL-encoding or percent-encoding.

Are URLs ASCII or Unicode?

URLs can only be sent over the Internet using the ASCII character-set. Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces.

What can Unicode handle?

The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, emoji (including in colors), and non-visual control and formatting codes.


1 Answers

Yes, the subset of ASCII (and therefore Unicode) that is allowed unescaped in URIs, such as letters and numbers. But the majority of the Unicode character set has to be percent-encoded.

like image 143
Joey Avatar answered Sep 28 '22 03:09

Joey