Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Classic ASP text substitution and UTF-8 encoding

We have a website that uses Classic ASP.

Part of our release process substitutes values in a file and we found a bug in it where it will write the file out as UTF-8.

This then causes our application to start spitting out garbage. Apostrophes get returned as some encoded characters.

If we then go an remove the BOM that says this file is UTF-8 then the text that was previously rendered as garbage is now displayed correctly.

Is there something that IIS does differently when it encounters UTF-8 a file?

like image 620
Derek Ekins Avatar asked Sep 21 '09 10:09

Derek Ekins


2 Answers

I was searching on the same exact issue yesterday and came across:

http://blog.inspired.no/utf-8-with-asp-71/

Important part from that page, in case it goes away...

ASP CODE:

Response.ContentType = "text/html"
Response.AddHeader "Content-Type", "text/html;charset=UTF-8"
Response.CodePage = 65001
Response.CharSet = "UTF-8"

and the following HTML META tag:

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />

We were using the meta tag and asp CharSet property, yet the page still didn't render correctly. After adding the other three lines to the asp file everything just worked.

Hope this helps!

like image 188
Werewolf Avatar answered Sep 28 '22 04:09

Werewolf


UTF-8 does not use BOMs; it is an annoying misfeature in some Microsoft software that puts them there. You need to find what step of your release process is putting a UTF-8-encoded BOM in your files and fix it — you should stop that even if you are using UTF-8, which really these days is best.

But I doubt it's IIS causing the display problem. More likely the browser is guessing the charset of the final displayed page, and when it sees bytes that look like they're UTF-8 encoded it guesses the whole page is UTF-8. You should be able to stop it doing that by stating a definitive charset by using an HTTP header:

Content-Type: text/html;charset=iso-8859-1

and/or a meta element in the HTML

<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1" />

Now (assuming ISO-8859-1 is actually the character set your data are in) it should display OK. However if your file really does have a UTF-8-encoded BOM at the start, you'll now see that as ‘’ in your page, which is what those bytes look like in ISO-8859-1. So you still need to get rid of that misBOM.

like image 45
bobince Avatar answered Sep 28 '22 04:09

bobince