Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting "unknown-8bit" charset to UTF-8

I'm helping a friend add content to an older website thats been written in something like FrontPage. However I have a html document that's encoded with "unknown-8bit" charset. Brackets.io that I'm working in only supports UTF-8, so I can't open and re-save the document to the correct encoding.

How would I go about converting this file into UTF-8 so that I can then work with it in brackets.io?

I'm using OS X 10.10 Yosemite, so I'm a bit more limited than if I sat on Windows (Notepad++ springs to mind).

After google'ing some I've tried the following in terminal, but "unknown-8bit" is unsupported.

iconv -f unknown-8bit -t utf-8 filename.html > newfilename.html
like image 882
Gamut Avatar asked Nov 03 '14 17:11

Gamut


1 Answers

You can use enca or chardet, enca will probably be more successful.

If you know the language the document was written in, you can guess the encoding and try converting until you get the right results:

  • English, French, German, Spanish... – usually Windows-1252

  • Russian, Ukrainian... – usually Windows-1251

  • Polish, Czech, Hungarian... – usually Windows-1250 or ISO-8859-2

  • Japanese – usually Shift-JIS

and so on.

like image 100
Karol S Avatar answered Oct 16 '22 07:10

Karol S