I have an ANSI encoded text file that should not have been encoded as ANSI as there were accented characters that ANSI does not support. I would rather work with UTF-8. Can the data be decoded correctly or is it lost in transcoding? What tools could I use? Here is a sample of what I have: <pre class="prettyprint"><code>Ã§ Ã© </code></pre> I can tell from context (cafÃ© should be café) that these should be these two characters: <pre class="prettyprint"><code>ç é </code></pre>

Follow these steps with Notepad++ 1- Copy the original text 2- In Notepad++, open new file, change Encoding -> pick an encoding you think the original text follows. Try as well the encoding "ANSI" as sometimes Unicode files are read as ANSI by certain programs 3- Paste 4- Then to convert to Unicode by going again over the same menu: Encoding -> "Encode in UTF-8" (Not "Convert to UTF-8") and hopefully it will become readable The above steps apply for most languages. You just need to guess the original encoding before pasting in notepad++, then convert through the same menu to an alternate Unicode-based encoding to see if things become readable. Most languages exist in 2 forms of encoding: 1- The old legacy ANSI (ASCII) form, only 8 bits, was used initially by most computers. 8 bits only allowed 256 possibilities, 128 of them where the regular latin and control characters, the final 128 bits were read differently depending on the PC language settings 2- The new Unicode standard (up to 32 bit) give a unique code for each character in all currently known languages and plenty more to come. if a file is unicode it should be understood on any PC with the language's font installed. Note that even UTF-8 goes up to 32 bit and is just as broad as UTF-16 and UTF-32 only it tries to stay 8 bits with latin characters just to save up disk space

How do I correct the character encoding of a file?

Tags:

character-encoding

text-files

encoding

utf-8

codepages

I have an ANSI encoded text file that should not have been encoded as ANSI as there were accented characters that ANSI does not support. I would rather work with UTF-8.

Can the data be decoded correctly or is it lost in transcoding?

What tools could I use?

Here is a sample of what I have:

Ã§ Ã©

I can tell from context (cafÃ© should be café) that these should be these two characters:

ç é

834

asked Sep 25 '08 09:09

Liam

1 Answers

Follow these steps with Notepad++

1- Copy the original text

2- In Notepad++, open new file, change Encoding -> pick an encoding you think the original text follows. Try as well the encoding "ANSI" as sometimes Unicode files are read as ANSI by certain programs

3- Paste

4- Then to convert to Unicode by going again over the same menu: Encoding -> "Encode in UTF-8" (Not "Convert to UTF-8") and hopefully it will become readable

The above steps apply for most languages. You just need to guess the original encoding before pasting in notepad++, then convert through the same menu to an alternate Unicode-based encoding to see if things become readable.

Most languages exist in 2 forms of encoding: 1- The old legacy ANSI (ASCII) form, only 8 bits, was used initially by most computers. 8 bits only allowed 256 possibilities, 128 of them where the regular latin and control characters, the final 128 bits were read differently depending on the PC language settings 2- The new Unicode standard (up to 32 bit) give a unique code for each character in all currently known languages and plenty more to come. if a file is unicode it should be understood on any PC with the language's font installed. Note that even UTF-8 goes up to 32 bit and is just as broad as UTF-16 and UTF-32 only it tries to stay 8 bits with latin characters just to save up disk space

195

answered Oct 03 '22 06:10

Gabriel

Related questions
                            
                                How to print UTF-8 encoded text to the console in Python < 3?
                            
                                Which encoding uses the \x (backslash x) prefix?
                            
                                How to encode periods for URLs in Javascript?
                            
                                encoding UTF8 does not match locale en_US; the chosen LC_CTYPE setting requires encoding LATIN1
                            
                                Convert UTF-8 to base64 string
                            
                                Email from PHP has broken Subject header encoding
                            
                                How to write file in UTF-8 format?
                            
                                Change File Encoding to utf-8 via vim in a script
                            
                                How to make python 3 print() utf8
                            
                                Scikit-learn's LabelBinarizer vs. OneHotEncoder
                            
                                Python3 and hmac . How to handle string not being binary
                            
                                Encode/Decode Array of Types conforming to protocol with JSONEncoder
                            
                                Why is base64_encode() adding a slash "/" in the result?
                            
                                UnicodeEncodeError: 'charmap' codec can't encode character '\u2010': character maps to <undefined> [duplicate]
                            
                                Signing and verifying signatures with RSA C#
                            
                                DELETE using CURL with encoded URL
                            
                                URL decoding: UnsupportedEncodingException in Java
                            
                                How to source() .R file saved using UTF-8 encoding?
                            
                                How do I write out a text file in C# with a code page other than UTF-8?
                            
                                Why should I use a human readable file format?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With