Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write a check in python to see if file is valid UTF-8?

Tags:

As stated in title, I would like to check in given file object (opened as binary stream) is valid UTF-8 file.

Anyone?

Thanks

like image 682
Jox Avatar asked Jul 16 '10 22:07

Jox


People also ask

How do I check if a file is UTF-8 encoded in Python?

Could be simpler by using only one line: codecs. open("path/to/file", encoding="utf-8", errors="strict").

How can I tell if a file is UTF-8?

To verify if a file passes an encoding such as ascii, iso-8859-1, utf-8 or whatever then a good solution is to use the 'iconv' command.

How do I know if my file is UTF 16 or UTF-8?

There are a few options you can use: check the content-type to see if it includes a charset parameter which would indicate the encoding (e.g. Content-Type: text/plain; charset=utf-16 ); check if the uploaded data has a BOM (the first few bytes in the file, which would map to the unicode character U+FEFF - 2 bytes for ...

How do you write to a text file with UTF-8 in Python?

Use write() and writelines() methods to write to a text file. Pass the encoding='utf-8' to the open() function to write UTF-8 characters into a file.


2 Answers

def try_utf8(data):
    "Returns a Unicode object on success, or None on failure"
    try:
       return data.decode('utf-8')
    except UnicodeDecodeError:
       return None

data = f.read()
udata = try_utf8(data)
if udata is None:
    # Not UTF-8.  Do something else
else:
    # Handle unicode data
like image 113
Daniel Stutzbach Avatar answered Sep 19 '22 13:09

Daniel Stutzbach


You could do something like

import codecs
try:
    f = codecs.open(filename, encoding='utf-8', errors='strict')
    for line in f:
        pass
    print "Valid utf-8"
except UnicodeDecodeError:
    print "invalid utf-8"
like image 38
michael Avatar answered Sep 21 '22 13:09

michael