How can I tell if a file is binary (non-text) in Python?
I am searching through a large set of files in Python, and keep getting matches in binary files. This makes the output look incredibly messy.
I know I could use grep -I
, but I am doing more with the data than what grep allows for.
In the past, I would have just searched for characters greater than 0x7f
, but utf8
and the like, make that impossible on modern systems. Ideally, the solution would be fast.
Yet another method based on file(1) behavior:
>>> textchars = bytearray({7,8,9,10,12,13,27} | set(range(0x20, 0x100)) - {0x7f}) >>> is_binary_string = lambda bytes: bool(bytes.translate(None, textchars))
Example:
>>> is_binary_string(open('/usr/bin/python', 'rb').read(1024)) True >>> is_binary_string(open('/usr/bin/dh_python3', 'rb').read(1024)) False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With