When I view this webpage which is physically encoded as UTF-8 and has UTF-8 specified as the charset on my PC (using firefox) it does not display characters that need to be encoded with multiple bytes properly such as as the ö in Björk, please click on the link and then open up the B section to see what I mean
http://www.jthink.net/songkong/reports/FixSongsReport00084/FixSongsReport00084_index.html
(The page is hosted on a linux server using jakarta-tomcat)
However the original file displays perfectly okay in Firefox when stored as a file on my harddrive. I even copied the file back from the remote site to my local PC to esure had the same file, and it still displays okay.
So how come it doesn't display ok on the website, could it be a tomcat problem ?
EDIT In the comment on the first answer it says i need to ensure that I need to set the response encoding correctly, how do I do this - the html page is not generated by code tomcat is just serving the page as provided
Note I don't to parse uri parameters as utf8, and I dont want the jsp pages that I created encoded as UTF8, these work fines as ISO-8859-1 and may break if I change them. I just want .html pages to be displayed as UTF8 , and only for this application, I have multiple applications in webapps folder and I am using Tomcat 7
EDIT
So as suggested in the answer below I've added to my web.xml file
<filter>
<filter-name>CharacterEncoding</filter-name>
<filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
<mime-mapping>
<extension>html</extension>
<mime-type>text/html;charset=UTF-8</mime-type>
</mime-mapping>
and that sort of works, the url now displays correctly, but the link doesn't work.
When I have a look at the source it seems to be using the correct link, but the error message shows it as an expanded notation rather than UTF8.
hers the whole report, so you can click on link in left handside and see result in righthandside
http://www.jthink.net/songkong/reports/FixSongsReport00084/FixSongsReport00084.html
Even if I copy link and paste it doesnt work as the link seems thats get pasted is wrong, although it then corrects itself
The UTF-8 BOM is a sequence of bytes at the start of a text stream ( 0xEF, 0xBB, 0xBF ) that allows the reader to more reliably guess a file as being encoded in UTF-8. Normally, the BOM is used to signal the endianness of an encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary.
UTF-8 is a character encoding system. It lets you represent characters as ASCII text, while still allowing for international characters, such as Chinese characters. As of the mid 2020s, UTF-8 is one of the most popular encoding systems.
0xC0, 0xC1, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF are invalid UTF-8 code units.
your page is returning this header :
Content-Type:text/html; charset=ISO-8859-1
but your page is encoded in UTF-8.
You can follow this thread to see how to change the response header :
Tomcat 7.0.35 set HTTP response header Content-Type charset for static HTML files
[EDIT]
The second problem relates to the encoding your server is expecting the urls to be encoded with.
As they will be encoding with utf-8, you can just update your tomcat config with this :
<Connector port="<whatever>" URIEncoding="UTF-8"/>
But what I'd strongly recommend is not to use this kind of characters neither in your urls nor in your html file names. There are more things involved here, as the encoding that is being used by your user when the server starts ..... and many more tweaks you will need to take care about. Just avoiding to use these chars will keep you away of these problems.
[/EDIT]
Hope it helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With