I have a test site that has been using windows-1252 all along. They do need/use some symbols like the square root symbol. And they have no need to display in another language other than English. I was recently asked to switch it to UTF-8 because of some security concerns. After I changed it to UTF-8 the square roots and other symbols (which are being pulled out of an Oracle DB and passed through ColdFusion) would appear fine on the resulting web page. However, if I saved the document again (post to DB, page refreshes) the symbols transformed into strange characters. If I saved again even more strange characters would appear. So... <ol> <li>If I don't need anything other than English is there anything wrong with sticking to windows-1252? Any security/hacking issues?</li> <li>Are there any implications of NOT using UTF-8 if you are using HTML5 (since that is the default encoding for HTML5)?</li> <li>If its recommended that I should switch to UTF-8, how do I get the currently stored square root symbols (and other symbols) to work?</li> </ol> I've already read all these pages, still having a little trouble grasping it all. Hoping someone here and help clarify for me. Thanks! <ol> <li>https://www.owasp.org/index.php/Canonicalization,_locale_and_Unicode</li> <li> Excellent description of how UTF-8 came about, why it’s awesome, and the problems it solves… https://www.youtube.com/watch?v=MijmeoH9LT4 </li> <li> http://www.w3.org/International/questions/qa-choosing-encodings “Use UTF-8, if you can”. “In fact the HTML5 specification draft currently says "Authors are encouraged to use UTF-8. Conformance checkers may advise authors against using legacy encodings. Authoring tools should default to using UTF-8 for newly-created documents."”</li> <li> http://www.w3schools.com/tags/ref_charactersets.asp “For HTML5, the default character encoding is UTF-8.”</li> <li>http://www.joelonsoftware.com/articles/Unicode.html</li> </ol> * * * UPDATE * * * I appreciate all that help so far to make this easier to understand. I'll simplify the original 3 questions so hopefully a clear answer can be reached, so here it is: The customer doesn't need support for other languages, they will be using some HTML5 tags and a TON of JSON/XML traffic sent back and forth via jQuery.ajax(). Given that info, from a security standpoint, is there anything wrong with keeping the database set to <code>NLS_CHARACTERSET: WE8MSWIN1252</code> and the webpages set to <code><CFHEADER NAME="Content-Type" value="text/html; charset=windows-1252"></code>? Thank you. Here is another question that is a slight spin off from this one: Why am I able to use a character that's not part of a charset (windows-1252)?.

You claim that Windows-1252 offers everything you need but the √ symbol is a counter-example. You must be using one of these tricks: <ul> <li>HTML entities: <code>&radic;</code>, <code>&#8730;</code> or similar</li> <li>Print another character and change the font </li> </ul> In either case, your solution is not portable: stuff will only display correctly in a properly configured web browser. Everything else (database, JavaScript, text files, plain text e-mail messages...) will not contain the real data. Additionally, JSON's only encoding is UTF-8. JavaScript will normally make the conversions for you but you must ensure that all your tool-chain behaves similarly. So to answer your main question: there's nothing wrong in using Windows-1252 if that's all you need. The problem is that you already need more than it can offer. As about your problems with UTF-8, it's obvious that UTF-8 is a full Unicode encoding so it does meet all the requirements. (Not being able to make it work can your reason to dump it but it isn't a technical reason.) My guess is that, since your current data doesn't have actual square root symbols, switching encodings breaks the trick you were using. You need to: <ol> <li>Find out what current data looks like</li> <li>Run a one-time search and replace</li> </ol>

Anything wrong with using windows-1252 instead of UTF-8

Tags:

html

coldfusion

encoding

utf-8

oracle

I have a test site that has been using windows-1252 all along. They do need/use some symbols like the square root symbol. And they have no need to display in another language other than English. I was recently asked to switch it to UTF-8 because of some security concerns. After I changed it to UTF-8 the square roots and other symbols (which are being pulled out of an Oracle DB and passed through ColdFusion) would appear fine on the resulting web page. However, if I saved the document again (post to DB, page refreshes) the symbols transformed into strange characters. If I saved again even more strange characters would appear. So...

If I don't need anything other than English is there anything wrong with sticking to windows-1252? Any security/hacking issues?
Are there any implications of NOT using UTF-8 if you are using HTML5 (since that is the default encoding for HTML5)?
If its recommended that I should switch to UTF-8, how do I get the currently stored square root symbols (and other symbols) to work?

I've already read all these pages, still having a little trouble grasping it all. Hoping someone here and help clarify for me. Thanks!

https://www.owasp.org/index.php/Canonicalization,_locale_and_Unicode
Excellent description of how UTF-8 came about, why it’s awesome, and the problems it solves… https://www.youtube.com/watch?v=MijmeoH9LT4
http://www.w3.org/International/questions/qa-choosing-encodings “Use UTF-8, if you can”. “In fact the HTML5 specification draft currently says "Authors are encouraged to use UTF-8. Conformance checkers may advise authors against using legacy encodings. Authoring tools should default to using UTF-8 for newly-created documents."”
http://www.w3schools.com/tags/ref_charactersets.asp “For HTML5, the default character encoding is UTF-8.”
http://www.joelonsoftware.com/articles/Unicode.html

* * * UPDATE * * *

I appreciate all that help so far to make this easier to understand. I'll simplify the original 3 questions so hopefully a clear answer can be reached, so here it is: The customer doesn't need support for other languages, they will be using some HTML5 tags and a TON of JSON/XML traffic sent back and forth via jQuery.ajax(). Given that info, from a security standpoint, is there anything wrong with keeping the database set to NLS_CHARACTERSET: WE8MSWIN1252 and the webpages set to <CFHEADER NAME="Content-Type" value="text/html; charset=windows-1252">? Thank you.

Here is another question that is a slight spin off from this one: Why am I able to use a character that's not part of a charset (windows-1252)?.

397

asked Jan 31 '14 21:01

gfrobenius

2 Answers

Windows 1252 is one of the many many fixed size character sets. Mac has its own set. there are a few ISO for various parts of the Europe and for some other parts of the world. Most of them have slight variations.

The good point is that you have a fixed-size character, meaning 1 character = 1 byte no matter what.

The bad points are:

Some people may not have your encoding installed
Some people may use a slightly different encoding, resulting in very few issues, not obvious to see, but very ugly on the long run
You can only support a few languages

That include any citation you would like to make. In windows-1252 you can't display russian, greek, polish ...

UTF-8 is the standard encoding for unicode representation on 1+ bytes. It can represent a very large majority of the characters you may encounter, although it is designed for latin-based languages, as other languages take more storage space.

It in used in XML, JSON, and most types of web services you may find. It is a good default when you don't know what encoding to use. It allows to limit the number of encoding issues, such as "I though you were in Latin-1 / No, I was using latin-9, but then this guy on mac used Roman". If you have more than 1 people working on the content of the website, they may have different encodings on their plateforme, and therefore your content may be messed up at some point.

UTF-8 is, as far as I know, the only way to easily standardize the encoding used between people without discussion.

Typical example is, if your website is encoded in windows1252, and the new dev has a mac, you'll probably be in trouble.

168

answered Nov 15 '22 04:11

njzk2

You claim that Windows-1252 offers everything you need but the √ symbol is a counter-example. You must be using one of these tricks:

HTML entities: √, √ or similar
Print another character and change the font

In either case, your solution is not portable: stuff will only display correctly in a properly configured web browser. Everything else (database, JavaScript, text files, plain text e-mail messages...) will not contain the real data.

Additionally, JSON's only encoding is UTF-8. JavaScript will normally make the conversions for you but you must ensure that all your tool-chain behaves similarly.

So to answer your main question: there's nothing wrong in using Windows-1252 if that's all you need. The problem is that you already need more than it can offer.

As about your problems with UTF-8, it's obvious that UTF-8 is a full Unicode encoding so it does meet all the requirements. (Not being able to make it work can your reason to dump it but it isn't a technical reason.) My guess is that, since your current data doesn't have actual square root symbols, switching encodings breaks the trick you were using. You need to:

Find out what current data looks like
Run a one-time search and replace

answered Nov 15 '22 05:11

Álvaro González

Related questions
                            
                                Getting back urls while loading multiple urls with YQL
                            
                                insertBefore() not working properly with PHP DOM
                            
                                Firefox localStorage how to access it across all tabs?
                            
                                Is it possible to apply bootstrap for a div only, using CDN?
                            
                                Is it possible to compile angular template to final html string?
                            
                                How to create a persistent javascript where it updates when content is constantly updated via ajax?
                            
                                How to mirror a <video> HTML5
                            
                                How can I access to USB stick from website?
                            
                                How to fix Wordpress admin bar destroying 100% height
                            
                                Why does this element in lxml include the tail?
                            
                                How to create links to sections in pdf
                            
                                CSS 3 fake 3D cube rotation between 2 boxes
                            
                                How to change blinking cursor/caret in textarea
                            
                                Split textbox into 3 areas
                            
                                How do I change an inputs border colour without changing the style?
                            
                                Fade a link out at the end of container
                            
                                Zurb Foundation topbar menu bar dropdown not working (the phone size one)
                            
                                Reverse Keyframe animation with Javascript
                            
                                Issues getting CasperJS to upload image to file field - tried CasperJS fill() and PhantomJS uploadFile()
                            
                                Smart text replacing with jQuery

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With