Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

speed string search in PHP

I have a 1.2GB file that contains a one line string. What I need is to search the entire file to find the position of an another string (currently I have a list of strings to search). The way what I'm doing it now is opening the big file and move a pointer throught 4Kb blocks, then moving the pointer X positions back in the file and get 4Kb more.

My problem is that a bigger string to search, a bigger time he take to got it.

Can you give me some ideas to optimize the script to get better search times?

this is my implementation:

function busca($inici){
        $limit = 4096;

        $big_one    = fopen('big_one.txt','r');
        $options    = fopen('options.txt','r');

        while(!feof($options)){
            $search = trim(fgets($options));
            $retro  = strlen($search);//maybe setting this position absolute? (like 12 or 15)

            $punter = 0;
            while(!feof($big_one)){
                $ara = fgets($big_one,$limit);

                $pos = strpos($ara,$search);
                $ok_pos = $pos + $punter;

                if($pos !== false){
                    echo "$pos - $punter - $search : $ok_pos <br>";
                    break;
                }

                $punter += $limit - $retro;
                fseek($big_one,$punter);
            }
            fseek($big_one,0);
        }
    }

Thanks in advance!

like image 567
Marc Avatar asked Jun 09 '10 22:06

Marc


People also ask

How do I search for a word in a string in PHP?

Simple text searching with strstr() PHP's strstr() function simply takes a string to search, and a chunk of text to search for. If the text was found, it returns the portion of the string from the first character of the match up to the end of the string: $myString = 'Hello, there!';

What is the use of strpos () function in PHP?

The strpos() function finds the position of the first occurrence of a string inside another string.

What is the use of strlen () and strpos () functions?

strpos() Function: This function helps us to find the position of the first occurrence of a string in another string. This returns an integer value of the position of the first occurrence of the string. This function is case-sensitive, which means that it treats upper-case and lower-case characters differently.

How do you check a character is present in a string in PHP?

Using str_contains. The str_contains is a new function that was introduced in PHP 8. This method is used to check if a PHP string contains a substring. The function checks the string and returns a boolean true in case it exists and false otherwise.


2 Answers

Why don't use exec + grep -b?

exec('grep "new" ext-all-debug.js -b', $result);
// here we have looked for "new" substring entries in the extjs debug src file
var_dump($result);

sample result:

array(1142) {
    [0]=>  string(97) "3398: * insert new elements. Revisiting the example above, we could utilize templating this time:"
    [1]=>  string(54) "3910:var tpl = new Ext.DomHelper.createTemplate(html);"
    ...
}

Each item consists of string offset in bytes from the start of file and the line itself, separated with colon.
So after this you have to look inside the particular line and append the position to the line offset. I.e.:

[0]=>  string(97) "3398: * insert new elements. Revisiting the example above, we could utilize templating this time:"

this means that "new" occurrence found at 3408th byte (3398 is the line position and 10 is the position of "new" inside this line)

like image 156
zerkms Avatar answered Oct 12 '22 10:10

zerkms


$big_one    = fopen('big_one.txt','r');
$options    = fopen('options.txt','r');  

while(!feof($options))
{
  $option = trim(fgets($options));
  $position = substr($big_one,$option);

  if($position)
    return $position; //exit loop
}

the size of the file is quite large though. you might want to consider storing the data in a database instead. or if you absolutely can't, then use the grep solution posted here.

like image 25
Sev Avatar answered Oct 12 '22 10:10

Sev