Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I detect if a file is binary (non-text) in Python?

How can I tell if a file is binary (non-text) in Python?

I am searching through a large set of files in Python, and keep getting matches in binary files. This makes the output look incredibly messy.

I know I could use grep -I, but I am doing more with the data than what grep allows for.

In the past, I would have just searched for characters greater than 0x7f, but utf8 and the like, make that impossible on modern systems. Ideally, the solution would be fast.

like image 277
grieve Avatar asked May 22 '09 16:05

grieve


1 Answers

Yet another method based on file(1) behavior:

>>> textchars = bytearray({7,8,9,10,12,13,27} | set(range(0x20, 0x100)) - {0x7f}) >>> is_binary_string = lambda bytes: bool(bytes.translate(None, textchars)) 

Example:

>>> is_binary_string(open('/usr/bin/python', 'rb').read(1024)) True >>> is_binary_string(open('/usr/bin/dh_python3', 'rb').read(1024)) False 
like image 142
jfs Avatar answered Oct 03 '22 08:10

jfs