I have (in an SQLite database) the following string:
Лампа в вытяжке на кухне меняется, начиная с вытаскивания белого штырька справа.
The string is correctly shown by PHP using print
. I would like to obtain just the first 50 chars of this string, i.e.
Лампа в вытяжке на кухне меняется, начиная с вытас
.
I have tried using both the substr and mb_substr, and get
Лампа в вытяжке на кухне ме�
, i.e. only 28 chars.
After reading here and elsewhere about the problems of mbstring, I realise that this is actually a 50 byte string (22 Russian chars = 44 bytes plus 5 spaces plus 1 question symbol).
Is there any nice solution to this? All my strings are UTF-8, so I could of course program a substr-function myself, by checking the first bit of every byte etc. But this should surely have been done before, right?
UPDATE: I believe mb_substr
does not work properly because mb_detect_encoding()
does not work properly.
The utf8_encode() function is an inbuilt function in PHP which is used to encode an ISO-8859-1 string to UTF-8. Unicode has been developed to describe all possible characters of all languages and includes a lot of symbols with one unique number for each symbol/character.
UTF-8 encodes a character into a binary string of one, two, three, or four bytes. UTF-16 encodes a Unicode character into a string of either two or four bytes. This distinction is evident from their names. In UTF-8, the smallest binary representation of a character is one byte, or eight bits.
Definition and Usage The utf8_encode() function encodes an ISO-8859-1 string to UTF-8. Unicode is a universal standard, and has been developed to describe all possible characters of all languages plus a lot of symbols with one unique number for each character/symbol.
substr in PHP is a built-in function used to extract a part of the given string. The function returns the substring specified by the start and length parameter. It is supported by PHP 4 and above. Let us see how we can use substr() to cut a portion of the string.
See below URL:
Extracting a substring from a UTF-8 string in PHP
http://osc.co.cr/extracting-a-substring-from-a-utf-8-string-in-php/
PHP substring with UTF-8
http://greekgeekz.blogspot.in/2010/11/php-substring-with-utf-8.html
Or try it:
Example #1
$str1 = utf8_encode("Feliz día");
$str2 = substr($str1, 0, 9);
echo utf8_decode($str2);
// will output Feliz d�
Example #2
$str3 = mb_substr($str1, 0, 9, 'UTF-8');
echo utf8_decode($str3);
// will output Feliz dí
As of PHP >= 5.3 you can also declare the encoding directive and use the substr function
Example #3
declare(encoding='UTF-8');
$str4 = "Feliz día";
$str5 = substr($str4, 0, 9);echo $str5;
// will output Feliz dí
As usual, the answer appears to have been here. (Honestly, I have searched for about an hour)
An answer at (鉑) string functions and UTF8 in php reads:
Make sure you set the proper internal encoding: mb_internal_encoding('utf-8');
With this mb_internal_encoding('utf-8'); everything works fine. Sorry to bother you guys, thanks for help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With