How can I find out what character encoding a given text file has?
var inputFile = "filename.txt";
var file = fs.readFileSync(inputFile);
var data = new Buffer(file, "ascii");
var fileEncoding = some_clever_function(file);
if (fileEncoding !== "utf8") {
// do something
}
Thanks
The UTF-8 file signature (commonly also called a "BOM") identifies the encoding format rather than the byte order of the document. UTF-8 is a linear sequence of bytes and not sequence of 2-byte or 4-byte units where the byte order is important.
The character encodings currently supported by Node.js are the following: 'utf8' (alias: 'utf-8' ): Multi-byte encoded Unicode characters. Many web pages and other document formats use UTF-8. This is the default character encoding.
You can view the file encoding in the status bar. Click on the encoding in the status bar to reopen or save the active file with a different encoding. Then choose an encoding.
You can try to use external module, such as https://www.npmjs.com/package/detect-character-encoding
The previously mentioned module works for me too. Alternatively you could have a look at detect-file-encoding-and-language which I'm using at the moment.
Installation:
$ npm install detect-file-encoding-and-language
Usage:
// index.js
const languageEncoding = require("detect-file-encoding-and-language");
const pathToFile = "/home/username/documents/my-text-file.txt"
languageEncoding(pathToFile).then(fileInfo => console.log(fileInfo));
// Possible result: { language: japanese, encoding: Shift-JIS, confidence: { language: 0.97, encoding: 1 } }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With