How do I go about getting the most popular words from multiple content tables in PHP/MySQL.
For example, I have a table forum_post with forum post; this contains a subject and content. Besides these I have multiple other tables with different fields which could also contain content to be analysed.
I would probably myself go fetch all the content, strip (possible) html explode the string on spaces. remove quotes and comma's etc. and just count the words which are not common by saving an array whilst running through all the words.
My main question is if someone knows of a method which might be easier or faster.
I couldn't seem to find any helpful answers about this it might be the wrong search patterns.
The magic you're looking for is a php function called str_word_count().
In my example code below, if you get a lot of extraneous words from this you'll need to write custom stripping to remove them. Additionally you'll want to strip all of the html tags from the words and other characters as well.
I use something similar to this for keyword generation (obviously that code is proprietary). In short we're taking provided text, we're checking the word frequency and if the words come up in order we're sorting them in an array based on priority. So the most frequent words will be first in the output. We're not counting words that only occur once.
<?php
$text = "your text.";
//Setup the array for storing word counts
$freqData = array();
foreach( str_word_count( $text, 1 ) as $words ){
// For each word found in the frequency table, increment its value by one
array_key_exists( $words, $freqData ) ? $freqData[ $words ]++ : $freqData[ $words ] = 1;
}
$list = '';
arsort($freqData);
foreach ($freqData as $word=>$count){
if ($count > 2){
$list .= "$word ";
}
}
if (empty($list)){
$list = "Not enough duplicate words for popularity contest.";
}
echo $list;
?>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With