Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string length differ with other languages...not in English

Tags:

php

I used this coding to shorten my testimonials on my site, works perfectly!. But now I have a problem it is... I have given the opportunity to users add their testimonials from their own language. My coding is working properly with English characters but not with other languages' characters.... can anyone tell me why it is????

   <?php
    $echo = $getFig["news_content"];
    if(strlen($echo) <= 100){
    $bar = $echo;
    }if(strlen($echo) > 100){
    $bar = substr($echo, 0, 101 )."<ahref='#'>Read More...</a>";
    }

    echo htmlspecialchars($bar);
    ?>

any comments are greatly appreciated.

Thank you.

like image 816
TNK Avatar asked Jan 02 '13 13:01

TNK


3 Answers

use mb_* functions. In your example mb_strlen and mb_substr.

The reason is that strlen and substr will count bytes that are perfectly fine for ASCII characters but some unicode characters allocate more than one byte so the result appears incorrect with strlen and substr. mb_* functions hide this issue perfectly while they count character-set, not a number of bytes.

For further information read the manual.

EDIT:


You can use str_word_count to count how many words are there in string if you are interested more in words than in characters.

Sample:

$str = 'Some long text Some long text Some long text Some long text Some long text Some long text';
echo str_word_count($str);

Note: If your target language has another delimiter than space for words, you can write custom function that will count occurrences of this delimiter in given string.

like image 178
Leri Avatar answered Nov 15 '22 10:11

Leri


See the note in the documentation:

Note:

strlen() returns the number of bytes rather than the number of characters in a string.

strlen() returns the byte count, not the character count; the two are only the same for single byte character-sets.

Use mb_strlen() if you want the character length of a multi-byte character-set string such as UTF-8

like image 27
Mark Baker Avatar answered Nov 15 '22 11:11

Mark Baker


Your problem is occurring to the fact that strlen works on ascii characters, of which is English. An alternative is the mb_strlen.

Here is a sample code:

<?php 
$str = "Some user input こんにちわ";
$len = mb_strlen($str);

This is just a sample to illustrate what I am trying to say, but I hope it solves your problem

like image 40
shawndreck Avatar answered Nov 15 '22 12:11

shawndreck