Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encoding html in ANSI vs UTF-8 w/o BOM

What's the difference between writing in Ansi and UTF-8 (without BOM) for example for a PHP or an HTML document, and then uploading them on a web server?
Both documents have the <meta charset="utf-8"> tag in their head.

If someone writes simply with notepad, they have to choose ANSI,
because notepad doesn't offer UTF-8 without byte order mark (BOM).

like image 689
Lukáš Kozák Avatar asked Mar 22 '23 16:03

Lukáš Kozák


2 Answers

The difference is that if you write your file in some 8-bit codepage and then forget to convert it to UTF-8, people might see your web page broken, because you set the charset is set to UTF-8 in meta; and to apply that bug fix in hurry, you cannot access the file in place using SFTP or WinSCP, because you'd have to convert into 8-bit codepage first again.

Furthermore UTF-8 is Unicode, and the full range of characters is supported, while in "ANSI" codepages then no. Not all Unicode documents can be converted back to "ANSI" codepages, and thus you could not edit them this way.

No sane person uses Windows Notepad for serious coding because its lack of functionality, syntax coloring, line ending formats and because of its awful support for character sets.

like image 106

The difference is that UTF-8 and “ANSI” (a Microsoft misnomer for various 8-bit encodings) are completely different encodings, though they coincide for the ASCII code range, 0x00 to 0x7F.

It is incorrect to label an “ANSI” file as UTF-8 encoded. The error does not cause observable effects if the data actually contains ASCII characters only or, in most cases, if the file is sent with HTTP headers which specify the correct encoding.

There is no reason not to use BOM for UTF-8 encoded HTML files. Pages that claim otherwise are based either on information about browsers that lost all practical impact years ago or on confusing HTML with PHP. In a PHP file, BOM may cause problems, because PHP software does not handle BOM correctly, i.e. does not remove it when inserting the content of a file in another.

Notepad is indeed unable to save the file as UTF-8 without BOM. Therefore, when creating or editing PHP files, use other programs, such as Notepad++. If you have to use Notepad, you just need to adapt to the limitations: use “ANSI” (after finding out what it is in your environment – it could be windows-1252, or something else), declare it in HTTP headers and meta tags, and use character references to represent characters that cannot be represented in “ANSI”.

like image 45
Jukka K. Korpela Avatar answered Apr 15 '23 08:04

Jukka K. Korpela