How to check whether text file is encoded in UTF-8 in C++?
Try to read it as UTF-8 and see if UTF-8 encoding is broken or not and if not, if there are valid Unicode points only.
But still there's no guarantee the file is in UTF-8 or ASCII or something else. How would you interpret a file containing a single byte, the letter A
? ASCII? UTF-8? Other? Likewise, what if the file starts with the BOM
by sheer luck but isn't really UTF-8 or isn't intended to be UTF-8?
This article may be of interest.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With