Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check whether text file is encoded in UTF-8?

Tags:

c++

utf-8

How to check whether text file is encoded in UTF-8 in C++?

like image 347
scdmb Avatar asked Jan 17 '23 23:01

scdmb


1 Answers

Try to read it as UTF-8 and see if UTF-8 encoding is broken or not and if not, if there are valid Unicode points only.

But still there's no guarantee the file is in UTF-8 or ASCII or something else. How would you interpret a file containing a single byte, the letter A? ASCII? UTF-8? Other? Likewise, what if the file starts with the BOM by sheer luck but isn't really UTF-8 or isn't intended to be UTF-8?

This article may be of interest.

like image 56
Alexey Frunze Avatar answered Jan 21 '23 13:01

Alexey Frunze