im trying to count the number of words in variable written in non-latin language (Bulgarian). But it seems that str_word_count() is not counting non-latin words. The encoding of the php file is UTF-8
$str = "текст на кирилица";
echo 'Number of words: '.str_word_count($str);
//this returns 0
The str_word_count() function is a built-in function in PHP and is used to return information about words used in a string like total number word in the string, positions of the words in the string etc. Parameters Used: $string:This parameter specifies the string whose words the user intends to count.
Step 1: Remove the trailing and leading white spaces using the trim() method. Step 2: Convert the multiple white spaces into single space using the substr_count() and str_replace() method. Step 3: Now counts the number of word in a string using substr_count($str, ” “)+1 and return the result.
php $string = "aabbbccddd"; $array=array($array); foreach (count_chars($string, 1) as $i => $val) { $count=chr($i); $array[]= $val. ",".
You may do it with regex:
$str = "текст на кирилица";
echo 'Number of words: '.count(preg_split('/\s+/', $str));
here I'm defining word delimiter as space characters. If there may be something else that will be treated as word delimiter, you'll need to add it into your regex.
Also, note, that since there's no utf characters in regex (not in string) - /u
modifier isn't required. But if you'll want some utf characters to act as delimiter, you'll need to add this regex modifier.
Update:
If you want only cyrillic letters to be treated in words, you may use:
$str = "текст
на 12453
кирилица";
echo 'Number of words: '.count(preg_split('/[^А-Яа-яЁё]+/u', $str));
And here is the solution that come to my mind:
$var = "текст на кирилица с пет думи";
$array = explode(" ", $var);
$i = 0;
foreach($array as $item)
{
if(strlen($item) > 2) $i++ ;
}
echo $i; // will return 5
As it stated in str_word_count
description
'word' is defined as a locale dependent string
Specify Bulgarian locale before calling str_word_count
setlocale(LC_ALL, 'bg_BG');
echo str_word_count($content);
Read more about setlocale
here.
The best solution I found is to provide a list of characters for word count function:
$text = 'текст на кирилице and on english too';
$count = str_word_count($text, 0, 'АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдеёжзийклмнопрстуфхцчшщъыьэюя');
echo $count; // => 7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With