Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use multibyte string functions in PHP?

At the moment, I don't understand why it is really important to use mbstring functions in PHP when dealing with UTF-8? My locale under linux is already set to UTF-8, so why doesn't functions like strlen, preg_replace and so on don't work properly by default?

like image 786
rabudde Avatar asked Jul 17 '11 06:07

rabudde


People also ask

What is a multibyte string PHP?

Mbstring stands for multi-byte string functions. Mbstring is an extension of php used to manage non-ASCII strings. Mbstring is used to convert strings to different encodings. Multibyte character encoding schemes are used to express more than 256 characters in the regular byte wise coding system.

What is multibyte string?

A multibyte character is a character composed of sequences of one or more bytes. Each byte sequence represents a single character in the extended character set. Multibyte characters are used in character sets such as Kanji. Wide characters are multilingual character codes that are always 16 bits wide.

What is multibyte language?

A multibyte character is a character that cannot be stored in a single byte, such as Chinese, Japanese, or Korean characters. These characters require two or three bytes of storage. A more precise definition can be found in ISO/IEC 9899:1990 subclause 3.13.

What are multibyte characters example?

Examples of multibyte character sets are the IBM-eucJP and the IBM-943 code sets. The single-byte code sets have at most 256 characters and the multibyte code sets have more than 256 (without any theoretical limit).


1 Answers

Here is my answer in plain English. A single Japanese and Chinese and Korean character take more than a single byte. Eg., a typical charactert say x is takes 1 byte in English it will take more than 1 byte in Japanese and Chinese and Korean. Now PHP's standard string functions are meant to treat a single character as 1 byte. So in case you are trying to do compare two Japanese or Chinese or Korean characters they will not work as expected. For example the length of "Hello World!" in Japanese or Chinese or Korean will have more than 12 bytes.

Read http://www.php.net/manual/en/intro.mbstring.php

like image 129
Kumar Avatar answered Oct 18 '22 06:10

Kumar