In php I need to Load a file and get all of the words and echo the word and the number of times each word shows up in the text, (I also need them to show up in descending order most used words on top) ★✩
Here's an example:
$text = "A very nice únÌcÕdë text. Something nice to think about if you're into Unicode.";
// $words = str_word_count($text, 1); // use this function if you only want ASCII
$words = utf8_str_word_count($text, 1); // use this function if you care about i18n
$frequency = array_count_values($words);
arsort($frequency);
echo '<pre>';
print_r($frequency);
echo '</pre>';
The output:
Array
(
[nice] => 2
[if] => 1
[about] => 1
[you're] => 1
[into] => 1
[Unicode] => 1
[think] => 1
[to] => 1
[very] => 1
[únÌcÕdë] => 1
[text] => 1
[Something] => 1
[A] => 1
)
And the utf8_str_word_count()
function, if you need it:
function utf8_str_word_count($string, $format = 0, $charlist = null)
{
$result = array();
if (preg_match_all('~[\p{L}\p{Mn}\p{Pd}\'\x{2019}' . preg_quote($charlist, '~') . ']+~u', $string, $result) > 0)
{
if (array_key_exists(0, $result) === true)
{
$result = $result[0];
}
}
if ($format == 0)
{
$result = count($result);
}
return $result;
}
$words = str_word_count($text, 1); $word_frequencies = array_count_values($words); arsort($word_frequencies); print_r($word_frequencies);
This function uses a regex to find words (you might want to change it, depending on what you define a word as)
function count_words($text)
{
$output = $words = array();
preg_match_all("/[A-Za-z'-]+/", $text, $words); // Find words in the text
foreach ($words[0] as $word)
{
if (!array_key_exists($word, $output))
$output[$word] = 0;
$output[$word]++; // Every time we find this word, we add 1 to the count
}
return $output;
}
This iterates over each word, constructing an associative array (with the word as the key) where the value refers to the occurences of each word. (e.g. $output['hello'] = 3 => hello occured 3 times in the text).
Perhaps you might want to change the function to deal with case insensitivity (i.e. 'hello' and 'Hello' are not the same word, according to this function).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With