Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I use multi-byte overloading (mbstring.func_overload)?

Tags:

php

unicode

I'm in the process of making my PHP site Unicode-aware. I'm wondering if anyone has experience with the mbstring.func_overload setting, which replaces the normal string functions (e.g. strlen) with their multi-byte equivalents (mb_strlen). There aren't any comments on the PHP manual page.

Are there any potential problems that I should be aware of? Any cases where calling the multi-byte version is a bad idea?

I suppose one example would be functions that deal with encryption, since they may expect to deal with strings of bytes, rather than strings of characters.

Also, the manual page includes a note: "It is not recommended to use the function overloading option in the per-directory context, because it's not confirmed yet to be stable enough in a production environment and may lead to undefined behaviour."

Does that mean that it's not stable in a per-directory context, or it's generally not stable? The wording is unclear.

like image 347
JW. Avatar asked Oct 21 '08 16:10

JW.


People also ask

What is function overloading in MBSTRING?

mbstring supports a 'function overloading' feature which enables you to add multibyte awareness to such an application without code modification by overloading multibyte counterparts on the standard string functions. For example, mb_substr () is called instead of substr () if function overloading is enabled.

How to use function overloading in PHP?

To use function overloading, set mbstring.func_overload in php.ini to a positive value that represents a combination of bitmasks specifying the categories of functions to be overloaded. It should be set to 1 to overload the mail () function. 2 for string functions, 4 for regular expression functions.

What is MBSTRING and how to enable it?

What is Mbstring and how to enable. Mbstring stands for multi-byte string functions. Mbstring is an extension of php used to manage non-ASCII strings. Mbstring is used to convert strings to different encodings. Multibyte character encoding schemes are used to express more than 256 characters in the regular byte wise coding system.

Should I use the function overloading option in the per-directory context?

It is not recommended to use the function overloading option in the per-directory context, because it's not confirmed yet to be stable enough in a production environment and may lead to undefined behaviour.


2 Answers

My answer is: definitely not!

The problem is that there is no easy way to "reset" str* functions once they are overloaded.

For some time this can work well with your project, but almost surely you will run into an external library that uses string functions to, for example, implement a binary protocol, and they will fail. They will fail and you will spend hours trying to find out why they are failing.

After you have found that it's mbstring.func_overload, you don't have too much option. You can ini_set the mbstring.internal_encoding to some one-byte-per-char encoding every time you call the external library and set it back right after, but if your library makes callbacks to your application, it will just mess up things.

Another option is to tweak the library manually, changing all str* functions to their mb_string counterpart and passing a one-byte-per-char as encoding parameter. This, however, isn't a great idea either, because you lose the ability to easily update your external, and you might cause some performance issues as well.

So, again, don't use func_overload. If you work with multi-byte strings, use the appropriate mb_ functions.

like image 199
gphilip Avatar answered Oct 21 '22 07:10

gphilip


one issue you should definitely watch for is 3rd party scripts (perhaps a library or pear extension) which uses non mb-aware versions of functions. for example, libraries that use strlen() could cause issues if you overload it.

as well, this bug report shows that the virtual host bleeding of mb_overloaded functions have been corrected in 5.2/5.3 CVS versions. the bug is specific to per-directory configurations.

like image 25
Owen Avatar answered Oct 21 '22 08:10

Owen