I need to find the encoding of all files that are placed in a directory. Is there a way to find the encoding used?
The file command is not able to do this.
The encoding that is of interest to me is ISO 8859-1. If the encoding is anything else, I want to move the file to another directory.
In Visual Studio, you can select "File > Advanced Save Options..." The "Encoding:" combo box will tell you specifically which encoding is currently being used for the file.
To verify if a file passes an encoding such as ascii, iso-8859-1, utf-8 or whatever then a good solution is to use the 'iconv' command.
There are a few options you can use: check the content-type to see if it includes a charset parameter which would indicate the encoding (e.g. Content-Type: text/plain; charset=utf-16 ); check if the uploaded data has a BOM (the first few bytes in the file, which would map to the unicode character U+FEFF - 2 bytes for ...
One way to check this is to use the W3C Markup Validation Service. The validator usually detects the character encoding from the HTTP headers and information in the document. If the validator fails to detect the encoding, it can be selected on the validator result page via the 'Encoding' pulldown menu (example).
It sounds like you're looking for enca. It can guess and even convert between encodings. Just look at the man page.
Or, failing that, use file -i (Linux) or file -I (OS X). That will output MIME-type information for the file, which will also include the character-set encoding. I found a man-page for it, too :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With