I'm using Sublime Text for Latex, so i need to use a specific encoding. However, in some cases, when I paste text copied from a different program (word/browser in most cases), I'm getting the message:
"Not all characters are representable in XXX encoding, falling back to UTF-8"
My question is: Is there any way to see which parts of the text cannot be encoded, so I can delete them manually?
I had this problem. It is caused by corrupt characters in your document. Here is how i solved it.
1) Make a search in your document for all standard characters. Make sure you enable regular expressions in your search, then paste this :
[^a-zA-Z0-9 -\.;<>/ ={}\[\]\^\?_\\\|:\r\n@]
You can add to that the normal accented characters of your language, here are the characters for French and German. Such as éà and so on :
[^a-zA-Z0-9 -\.;<>/ ='{}\[\]\^\?_\\\|:\r\n~@éàèêîôâûçäöüÄÖÜß]
2) Search for that, and Keep pressing F3 until you see mangled characters. Usually something like "è" which is a corrupt version of "à".
3) Delete those characters or replace them with what they should be.
You will be able to convert the document to another encoding when you have cleared all corrupt characters out.
For Linux users, it's also possible to automatically remove broken characters with command iconv:
iconv -f UTF-8 -t Windows-1251 -c < ~/temp/data.csv > ~/temp/data01.csv
-c Silently discard characters that cannot be converted instead of terminating when encountering such characters.
Just adding to @Draken response: here is the RegEx with spanish characters added.
[^a-zA-Z0-9 -\.;<>/ =“”'{}\[\]\^\?_\\\|:\r\n~@àèêîôâûçäöüÄÖÜßáéíóúñÑ¿€]
In my case I hitted Ctrl+H (for replacement) and as a replacement expression used nothing. So everything got cleared super fast and I was able to save it using ISO-8859-1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With