Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find most repeated sub strings in array

Tags:

arrays

string

php

I have an array:

$myArray=array(

'hello my name is richard',
'hello my name is paul',
'hello my name is simon',
'hello it doesn\'t matter what my name is'

);

I need to find the sub string (min 2 words) that is repeated the most often, maybe in an array format, so my return array could look like this:

$return=array(

array('hello my', 3),
array('hello my name', 3),
array('hello my name is', 3),
array('my name', 4),
array('my name is', 4),
array('name is', 4),

);

So I can see from this array of arrays how often each string was repeated amongst all strings in the array.

Is the only way to do it like this?..

function repeatedSubStrings($array){

    foreach($array as $string){
        $phrases=//Split each string into maximum number of sub strings
        foreach($phrases as $phrase){
            //Then count the $phrases that are in the strings
        }
    }

}

I've tried a solution similar to the above but it was too slow, processing around 1000 rows per second, can anyone do it faster?

like image 472
Drahcir Avatar asked Dec 20 '11 23:12

Drahcir


People also ask

How do you find a repeating substring in Python?

(1) First generate the possible sub-strings you want to search in each string. Is there a min or max length? Build a list or set of sub-strings from the input string. (2) Once you have the sub-strings to search for, try to identify the unique locations within the input string where the substrings appear.


2 Answers

A solution to this might be

function getHighestRecurrence($strs){

  /*Storage for individual words*/
  $words = Array();

  /*Process multiple strings*/
  if(is_array($strs))
      foreach($strs as $str)
         $words = array_merge($words, explode(" ", $str));

 /*Prepare single string*/
  else
      $words = explode(" ",$strs);

  /*Array for word counters*/
  $index = Array();

  /*Aggregate word counters*/
  foreach($words as $word)

          /*Increment count or create if it doesn't exist*/
          (isset($index[$word]))? $index[$word]++ : $index[$word] = 1;


  /*Sort array hy highest value and */
  arsort($index);

  /*Return the word*/
  return key($index);
}
like image 53
CBusBus Avatar answered Sep 29 '22 01:09

CBusBus


While this has a higher runtime, I think it's simpler from an implementation perspective:

$substrings = array();

foreach ($myArray as $str)
{
    $subArr = explode(" ", $str);
    for ($i=0;$i<count($subArr);$i++)
    {
        $substring = "";
        for ($j=$i;$j<count($subArr);$j++)
        {
            if ($i==0 && ($j==count($subArr)-1))
                break;      
            $substring = trim($substring . " " . $subArr[$j]);
            if (str_word_count($substring, 0) > 1)
            {
                if (array_key_exists($substring, $substrings))
                    $substrings[$substring]++;
                else
                    $substrings[$substring] = 1;
            }
        }
    }   
}

arsort($substrings);
print_r($substrings);
like image 30
Abbas Avatar answered Sep 29 '22 02:09

Abbas