How can I determine if a string contains non-printable characters/is likely binary data?
This is for unit testing/debugging -- it doesn't need to be exact.
This will have to do.
function isBinary($str) {
return preg_match('~[^\x20-\x7E\t\r\n]~', $str) > 0;
}
To search for non-printable characters, you can use ctype_print
(http://php.net/manual/en/function.ctype-print.php).
After a few attempts using ctype_ and various workarounds like removing whitespace chars and checking for empty, I decided I was going in the wrong direction. The following approach uses mb_detect_encoding (with the strict flag!) and considers a string as "binary" if the encoding cannot be detected.
So far i haven't found a non-binary string which returns true, and the binary strings that return false only do so if the binary happens to be all printable characters.
/**
* Determine whether the given value is a binary string by checking to see if it has detectable character encoding.
*
* @param string $value
*
* @return bool
*/
function isBinary($value): bool
{
return false === mb_detect_encoding((string)$value, null, true);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With