Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the functions PHP which are said not to be "binary safe"? To which libraries these "non-binary safe" functions hand off the strings? And why?

I'm using Windows 10 Home Single Language Edition which is a 64-bit Operating System on my machine.

I've installed the most latest version of XAMPP which has installed PHP 7.2.7 on my machine.

I'm asking this question based on the excerpt taken from the PHP Manual :

The string in PHP is implemented as an array of bytes and an integer indicating the length of the buffer. It has no information about how those bytes translate to characters, leaving that task to the programmer. There are no limitations on the values the string can be composed of; in particular, bytes with value 0 (“NUL bytes”) are allowed anywhere in the string (however, a few functions, said in this manual not to be “binary safe”, may hand off the strings to libraries that ignore data after a NUL byte.)

I understand very well the difference between binary-safe and non-binary safe functions in PHP. I've following doubts in my mind. Please answer them in one-by-one fashion with appropriate explanation accompanied with suitable examples.

  • Is the phenomenon of "non-binary safe" and "binary-safe" functions present in PHP only because the entire PHP parser has been written in C language?
  • What are the differences between C and PHP in case of handling strings containing any value(including NUL byte)?
  • I want the complete lists of functions in PHP which are "non-binary safe" and which are "binary-safe".
  • Is the characteristic of "non-binary safe" and "binary-safe" applicable only to functions that manipulate over strings and not applicable to PHP functions that deal with other types in PHP?
  • Why do the non-binary safe functions hand off the strings to libraries?
  • Do the non-binary safe functions hand off the strings to libraries only when the string they are handling contains NUL byte?
  • What are those libraries to which these "non-binary safe" functions hand off the strings?
  • How these libraries handle the strings received from "non-binary safe" functions?
  • Do the "non-binary safe" functions work like "binary safe" functions after handing off the strings that contain NUL byte to some library?
like image 722
PHPFan Avatar asked Jun 23 '18 11:06

PHPFan


3 Answers

Like arkascha explained, the issue of "binary-safe" and "non-binary-safe" has nothing to do with the language.

Using a null byte (0x00) to indicate the end of the string is simpler (which is probably why C went with it), but the downside is you can't have a null byte anywhere in the string which is a big limitation if you have to be able to handle all kinds of data. Storing the length as a metadata part of a string is more complex, as shown by Pete, but it allows you to handle any kind of data.

Regarding which functions that are "binary-safe" or "non-binary-safe", just read the PHP Manual before using the functions. That's what I do. There is no need to construct a list because the PHP Manual already explains what you need to know about the functions, including if they are binary-safe or not.

Most of your post, I believe, is due to a misunderstanding of PHP Manual's explanation that you quoted, particularly this part:

however, a few functions, said in this manual not to be “binary safe”, may hand off the strings to libraries that ignore data after a NUL byte.

Let me try making it clearer by adding some of my own words:

however, a few functions, said in this manual not to be “binary safe”, are the functions that may hand off the strings to libraries that ignore data after a NUL byte.

So it really doesn't say "non-binary safe functions hand off the strings to libraries", this is a misinterpretation. What it means is "functions that may hand off the strings to libraries that ignore data after a NUL byte, are said in this manual as not binary-safe".

"Handing off to libraries" is just another way of saying "calling functions from other libraries". "Ignoring data after a NUL byte" is a behavior that is called not binary-safe.

Another way of putting it is:

A few functions in this manual are said not to be "binary safe" because they may call other functions that are also not "binary safe" (functions that ignore data after a NUL byte).

I hope this clears it up for you.

like image 138
LBear Avatar answered Nov 11 '22 06:11

LBear


Traditionally there are two ways to represent strings: by signaling the end of the string using a special character or by storing its length along with the string data. C uses the former; a string is a char-array with a null character at the end. However, this has the limitation that strings in C cannot use a null character anywhere else but at the end.

To overcome this limitation, the PHP engine uses this struct to represent a string:

struct _zend_string {
    zend_refcounted_h gc; /* refcount struct */
    zend_ulong        h;  /* hash value */
    size_t            len; /* length of string */
    char              val[1]; /* array of chars (using struct "hack") */
};

As you can see, the PHP devs chose to store the length of the string along with its data.

Now what happens if mix "binary safe" and "non-binary safe" functionality?

Consider the following piece of C code that may be used when writing a PHP extension:

zend_string *a = zend_string_init("a\0b", /* string length */ 3, 0);
zend_string *b = zend_string_init("a\0c", /* string length */ 3, 0);

if (strcmp(a->val, b->val) == 0) {
    php_printf("Strings are equal!");
}

What do you think will happen? This code outputs "Strings are equal!" while they clearly are not equal. Since strcmp does not take the length of strings into account, it is a non-binary safe function.

Most of C's standard library string functions can be classified as "non-binary safe" since it relies on the null termination character.

When dealing with zend_string in extension code, you should use the Zend string functions (zend_string_*) instead of C's string library.

To fix the previous code:

if (zend_string_equals(a, b)) {
    php_printf("Equal!");
} else {
    php_printf("Not equal");
}

This now correctly prints "Not equal".

like image 30
Pieter van den Ham Avatar answered Nov 11 '22 07:11

Pieter van den Ham


The question whether a function processes runtime data in a "binary safe" way or not has nothing to do with the language the system has been implemented in. It is a question of how the data is handled. PHP is a high level language which means it has a high level implementation of a string type. That does not depend on a terminating null character as C relies on, instead the string type maintains meta data about the stored string which allows a much more flexible and robust implementation. That however has little to do with being "binary safe" or not.

The rest of your points cannot really be answered in a clear way. What libraries php uses itself depends on your setup, that is a dynamic environment. How potential libraries handle data handed over to them has again nothing to do with whether a php function can be considered "binary safe" - the library does not know about php, it only gets handed over data and processes that according to how the library is implemented.

like image 20
arkascha Avatar answered Nov 11 '22 06:11

arkascha