I'm programming a small web app to manage texts with external writers, actually the whole thing is great but I have a small problem. And it's related with the word counter.
The writers will be paid based on the number of words in text, the text contains html tags. But the problem is that there are german characters used(Ä, Ö, Ü, ß)
So at the first position I deleted the tags
$content = strip_tags($content);
then I replace new lines and tabs with simple spaces
$replace = array("\r\n", "\n", "\r", "\t");
$content = str_replace($replace, ' ', $content);
and finally I try to get the number of words
Method 1:
$characterMap = 'ÄÖÜäöü߀';
$count = str_word_count($content, 0, $characterMap);
Method 2:
$to_delete = array('.', ',', ';', "'", '@');
$content = str_replace($to_delete, '', $content);
$count = count(preg_split('~[^\p{L}\p{N}\']+~u',$content));
but the results are different to others like the ones from Word, or from CKEditor Plugin word_count.
For example for an Example Text
Word and CkEditor Word Count give 987 Words
Method 1: 968 Words
Method 2: 995 Words
The problem bei the second method are just the - separators by the words, but my question is if there is a better method to find the number of words in a text in php?
First, you could combine your two replace statements into one -- word count will ignore double spaces. Second, I'm unsure what the objective is of your regex, but it looks mighty strange.
You should be able to simply do this:
$content = strip_tags($content);
$replace = array("\r\n", "\n", "\r", "\t", '.', ',', ';', "'", '@');
$content = str_replace($replace, ' ', $content);
$count = str_word_count($content, 0, $characterMap);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With