I'm using Windows 10 Home Single Language Edition which is a 64-bit Operating System on my machine.
I've installed the most latest version of XAMPP which has installed PHP 7.2.7 on my machine.
I'm asking this question based on the excerpt taken from the PHP Manual :
The string in PHP is implemented as an array of bytes and an integer indicating the length of the buffer. It has no information about how those bytes translate to characters, leaving that task to the programmer. There are no limitations on the values the string can be composed of; in particular, bytes with value 0 (“NUL bytes”) are allowed anywhere in the string (however, a few functions, said in this manual not to be “binary safe”, may hand off the strings to libraries that ignore data after a NUL byte.)
I understand very well the difference between binary-safe and non-binary safe functions in PHP. I've following doubts in my mind. Please answer them in one-by-one fashion with appropriate explanation accompanied with suitable examples.
Like arkascha explained, the issue of "binary-safe" and "non-binary-safe" has nothing to do with the language.
Using a null byte (0x00) to indicate the end of the string is simpler (which is probably why C went with it), but the downside is you can't have a null byte anywhere in the string which is a big limitation if you have to be able to handle all kinds of data. Storing the length as a metadata part of a string is more complex, as shown by Pete, but it allows you to handle any kind of data.
Regarding which functions that are "binary-safe" or "non-binary-safe", just read the PHP Manual before using the functions. That's what I do. There is no need to construct a list because the PHP Manual already explains what you need to know about the functions, including if they are binary-safe or not.
Most of your post, I believe, is due to a misunderstanding of PHP Manual's explanation that you quoted, particularly this part:
however, a few functions, said in this manual not to be “binary safe”, may hand off the strings to libraries that ignore data after a NUL byte.
Let me try making it clearer by adding some of my own words:
however, a few functions, said in this manual not to be “binary safe”, are the functions that may hand off the strings to libraries that ignore data after a NUL byte.
So it really doesn't say "non-binary safe functions hand off the strings to libraries", this is a misinterpretation. What it means is "functions that may hand off the strings to libraries that ignore data after a NUL byte, are said in this manual as not binary-safe".
"Handing off to libraries" is just another way of saying "calling functions from other libraries". "Ignoring data after a NUL byte" is a behavior that is called not binary-safe.
Another way of putting it is:
A few functions in this manual are said not to be "binary safe" because they may call other functions that are also not "binary safe" (functions that ignore data after a NUL byte).
I hope this clears it up for you.
Traditionally there are two ways to represent strings: by signaling the end of the string using a special character or by storing its length along with the string data. C uses the former; a string is a char-array with a null character at the end. However, this has the limitation that strings in C cannot use a null character anywhere else but at the end.
To overcome this limitation, the PHP engine uses this struct to represent a string:
struct _zend_string {
zend_refcounted_h gc; /* refcount struct */
zend_ulong h; /* hash value */
size_t len; /* length of string */
char val[1]; /* array of chars (using struct "hack") */
};
As you can see, the PHP devs chose to store the length of the string along with its data.
Now what happens if mix "binary safe" and "non-binary safe" functionality?
Consider the following piece of C code that may be used when writing a PHP extension:
zend_string *a = zend_string_init("a\0b", /* string length */ 3, 0);
zend_string *b = zend_string_init("a\0c", /* string length */ 3, 0);
if (strcmp(a->val, b->val) == 0) {
php_printf("Strings are equal!");
}
What do you think will happen? This code outputs "Strings are equal!" while they clearly are not equal. Since strcmp
does not take the length of strings into account, it is a non-binary safe function.
Most of C's standard library string functions can be classified as "non-binary safe" since it relies on the null termination character.
When dealing with zend_string
in extension code, you should use the Zend string functions (zend_string_*
) instead of C's string library.
To fix the previous code:
if (zend_string_equals(a, b)) {
php_printf("Equal!");
} else {
php_printf("Not equal");
}
This now correctly prints "Not equal".
The question whether a function processes runtime data in a "binary safe" way or not has nothing to do with the language the system has been implemented in. It is a question of how the data is handled. PHP is a high level language which means it has a high level implementation of a string type. That does not depend on a terminating null character as C relies on, instead the string type maintains meta data about the stored string which allows a much more flexible and robust implementation. That however has little to do with being "binary safe" or not.
The rest of your points cannot really be answered in a clear way. What libraries php uses itself depends on your setup, that is a dynamic environment. How potential libraries handle data handed over to them has again nothing to do with whether a php function can be considered "binary safe" - the library does not know about php, it only gets handed over data and processes that according to how the library is implemented.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With