Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ISO-8859-1 vs UTF-8?

What should be used and when? Or is it always better to use UTF-8? Or ISO-8859-1 still has importance in specific conditions?

Is the haracter set related to geographic region?


Is there a benefit to using the code @charset "utf-8";?

Or like this <link type="text/css; charset=utf-8" rel="stylesheet" href=".." />

at the top of the CSS file?

I found for this

If Dreamweaver adds the tag when you add embedded style to the document, that is a bug in Dreamweaver. From the W3C FAQ:

"For style declarations embedded in a document, @charset rules are not needed and must not be used."

The charset specification is a part of CSS since version 2.0 (may 1998), so if you have a charset specification in a CSS file and Safari can't handle it, that's a bug in Safari.

And add accept-charset in the form:

<form action="/action" method="post" accept-charset="utf-8"> 

And what should be used if I use the XHTML doctype?

<?xml version="1.0" encoding="UTF-8"?> 

or

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 
like image 699
Jitendra Vyas Avatar asked Dec 12 '09 16:12

Jitendra Vyas


People also ask

Should I use UTF-8 or ISO 8859?

Most libraries that don't hold a lot of foreign language materials will be perfectly fine with ISO8859-1 ( also called Latin-1 or extended ASCII) encoding format, but if you do have a lot of foreign language materials you should choose UTF-8 since that provides access to a lot more foreign characters.

Is ISO-8859-1 still used?

As of August 2022, 1.3% of all (but only 8 of the top 1000) websites use ISO/IEC 8859-1. It is the most declared single-byte character encoding in the world on the web, but as web browsers interpret it as the superset Windows-1252 the documents may include characters from that set.

How do I convert UTF-8 to ISO-8859-1?

Going backwards from UTF-8 to ISO-8859-1 will cause "replacement characters" (�) to appear in your text when unsupported characters are found. byte[] utf8 = ... byte[] latin1 = new String(utf8, "UTF-8"). getBytes("ISO-8859-1"); You can exercise more control by using the lower-level Charset APIs.

Is UTF-8 still used?

UTF-8 is the dominant encoding for the World Wide Web (and internet technologies), accounting for 98% of all web pages, and up to 100.0% for some languages, as of 2022.


2 Answers

Unicode is taking over and has already surpassed all others. I suggest you hop on the train right now.

Note that there are several flavors of unicode. Joel Spolsky gives an overview.

Unicode is winning (Graph current as of Feb. 2012, see comment below for more exact values.)

like image 126
nes1983 Avatar answered Sep 22 '22 01:09

nes1983


UTF-8 is supported everywhere on the web. Only in specific applications is it not. You should always use UTF-8 if you can.

The downside is that for languages such as Chinese, UTF-8 takes more space than, say, UTF-16. But if you don't plan on going Chinese, or even if you do go Chinese then UTF-8 is fine.

The only cons against using UTF-8 is that it takes more space compared to various encodings, but compared to western languages it takes almost no extra space at all, except for very special characters, and those extra bytes you can live with. We are in 2009 after all. ;)

like image 40
Tor Valamo Avatar answered Sep 22 '22 01:09

Tor Valamo