I ran into a new problem that I've never seen before: My client is adding files to a project we built and some of the filenames have special characters in them because some of the words are spanish.
For example a file I'm testing has an á in it. I am calling that image in a css file as a background image but in Safari it doesnt show up. But it does on FF and Chrome.
As a test I pasted the link into the browser and the same thing. Works on FF and Chrome but Safari throws an error. So the language characters are throwing it I guess?
Firefox converts the following url and changes the á to a%CC%81 and loads the image.
http://www.themediacouncil.com/test/nonascii/LA-MAR_Cebiche-Clássico_foto-Henrique-Peron-470x120-1371827671.jpg
You can see it breaks above... but FF and Chrome convert that to: http://www.themediacouncil.com/test/nonascii/LA-MAR_Cebiche-Cla%CC%81ssico_foto-Henrique-Peron-470x120-1371827671.jpg
You can also see this in action here: http://jsfiddle.net/Md4gZ/2/
.testbox {
width:340px;
height:100px;
background:url('http://www.themediacouncil.com/test/nonascii/LA-MAR_Cebiche-Clássico_foto-Henrique-Peron-470x120-1371827671.jpg') no-repeat top left;
}
So whats the right way to handle this. I'm developing in PHP and WORDPRESS. I'd rather not have to tell the client to go back and replace all files with special characters.
Any help is appreciated. Thanks!
Overview. Non-ASCII characters are those that are not encoded in ASCII, such as Unicode, EBCDIC, etc. ASCII is limited to 128 characters and was initially developed for the English language. In this tutorial, we'll look at some tools to find and highlight non-ASCII characters within text files.
URLs can only be sent over the Internet using the ASCII character-set. Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits.
There are only certain characters that are allowed in the URL string, alphabetic characters, numerals, and a few characters ; , / ? : @ & = + $ - _ . ! ~ * ' ( ) # that can have special meanings.
These characters are { , } , | , \ , ^ , ~ , [ , ] , and ` . All unsafe characters must always be encoded within a URL.
I believe what is becoming the standard is to convert non-ascii characters to UTF-8 byte sequences, and include those sequences as %HH hex codes in the URL. The á character is U+00E1 (Unicode), which in UTF-8 makes the two bytes 0xC3 0xA1
. Hence, Clássico
would become Cl%C3%A1ssico
.
The conversion you report from Firefox, Cla%CC%81ssico
, did this slightly differently: it changed the á into a followed by U+0301, the COMBINING ACUTE ACCENT character. In UTF-8, U+0301 makes 0xCC 0x81
.
Which representation you should choose – unicode “á” or “a followed by combining accent” – depends on what the web server needs for matching the right thing. In your case, maybe the filename actually contains the combining-character accent, and that's why it worked (hard to tell).
Another, older, way to handle non-ascii latin characters is to use an 8-bit latin charset representation (ISO-8859-1 or something similar, such as Windows-1252) and encode that as one byte. That would make Clássico
into Cl%E1ssico
. But since this only works for latin charsets, and is ambiguous for some of their characters, it is hopefully and probably disappearing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With