Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get Popular words in PHP+MySQL

Tags:

php

mysql

How do I go about getting the most popular words from multiple content tables in PHP/MySQL.

For example, I have a table forum_post with forum post; this contains a subject and content. Besides these I have multiple other tables with different fields which could also contain content to be analysed.

I would probably myself go fetch all the content, strip (possible) html explode the string on spaces. remove quotes and comma's etc. and just count the words which are not common by saving an array whilst running through all the words.

My main question is if someone knows of a method which might be easier or faster.

I couldn't seem to find any helpful answers about this it might be the wrong search patterns.

like image 886
Mathijs Segers Avatar asked May 23 '13 07:05

Mathijs Segers


1 Answers

Somebody's already done it.

The magic you're looking for is a php function called str_word_count().

In my example code below, if you get a lot of extraneous words from this you'll need to write custom stripping to remove them. Additionally you'll want to strip all of the html tags from the words and other characters as well.

I use something similar to this for keyword generation (obviously that code is proprietary). In short we're taking provided text, we're checking the word frequency and if the words come up in order we're sorting them in an array based on priority. So the most frequent words will be first in the output. We're not counting words that only occur once.

<?php
$text = "your text.";

//Setup the array for storing word counts
$freqData = array();
foreach( str_word_count( $text, 1 ) as $words ){
// For each word found in the frequency table, increment its value by one
array_key_exists( $words, $freqData ) ? $freqData[ $words ]++ : $freqData[ $words ] = 1;
}

$list = '';
arsort($freqData);
foreach ($freqData as $word=>$count){
    if ($count > 2){
        $list .= "$word ";
    }
}
if (empty($list)){
    $list = "Not enough duplicate words for popularity contest.";   
}
echo $list;
?>
like image 94
AbsoluteƵERØ Avatar answered Oct 12 '22 08:10

AbsoluteƵERØ