I have a 1.2GB file that contains a one line string. What I need is to search the entire file to find the position of an another string (currently I have a list of strings to search). The way what I'm doing it now is opening the big file and move a pointer throught 4Kb blocks, then moving the pointer X positions back in the file and get 4Kb more. My problem is that a bigger string to search, a bigger time he take to got it. Can you give me some ideas to optimize the script to get better search times? this is my implementation: <pre class="prettyprint"><code>function busca($inici){ $limit = 4096; $big_one = fopen('big_one.txt','r'); $options = fopen('options.txt','r'); while(!feof($options)){ $search = trim(fgets($options)); $retro = strlen($search);//maybe setting this position absolute? (like 12 or 15) $punter = 0; while(!feof($big_one)){ $ara = fgets($big_one,$limit); $pos = strpos($ara,$search); $ok_pos = $pos + $punter; if($pos !== false){ echo "$pos - $punter - $search : $ok_pos "; break; } $punter += $limit - $retro; fseek($big_one,$punter); } fseek($big_one,0); } } </code></pre> Thanks in advance!

Why don't use <code>exec</code> + <code>grep -b</code>? <pre class="prettyprint"><code>exec('grep "new" ext-all-debug.js -b', $result); // here we have looked for "new" substring entries in the extjs debug src file var_dump($result); </code></pre> sample result: <pre class="prettyprint"><code>array(1142) { [0]=> string(97) "3398: * insert new elements. Revisiting the example above, we could utilize templating this time:" [1]=> string(54) "3910:var tpl = new Ext.DomHelper.createTemplate(html);" ... } </code></pre> Each item consists of string offset in bytes from the start of file and the line itself, separated with colon. So after this you have to look inside the particular line and append the position to the line offset. I.e.: <pre class="prettyprint"><code>[0]=> string(97) "3398: * insert new elements. Revisiting the example above, we could utilize templating this time:" </code></pre> this means that "new" occurrence found at 3408th byte (3398 is the line position and 10 is the position of "new" inside this line)

<pre class="prettyprint"><code>$big_one = fopen('big_one.txt','r'); $options = fopen('options.txt','r'); while(!feof($options)) { $option = trim(fgets($options)); $position = substr($big_one,$option); if($position) return $position; //exit loop } </code></pre> the size of the file is quite large though. you might want to consider storing the data in a database instead. or if you absolutely can't, then use the grep solution posted here.

speed string search in PHP

Tags:

performance

string

php

search

I have a 1.2GB file that contains a one line string. What I need is to search the entire file to find the position of an another string (currently I have a list of strings to search). The way what I'm doing it now is opening the big file and move a pointer throught 4Kb blocks, then moving the pointer X positions back in the file and get 4Kb more.

My problem is that a bigger string to search, a bigger time he take to got it.

Can you give me some ideas to optimize the script to get better search times?

this is my implementation:

Click to copy

function busca($inici){
        $limit = 4096;

        $big_one    = fopen('big_one.txt','r');
        $options    = fopen('options.txt','r');

        while(!feof($options)){
            $search = trim(fgets($options));
            $retro  = strlen($search);//maybe setting this position absolute? (like 12 or 15)

            $punter = 0;
            while(!feof($big_one)){
                $ara = fgets($big_one,$limit);

                $pos = strpos($ara,$search);
                $ok_pos = $pos + $punter;

                if($pos !== false){
                    echo "$pos - $punter - $search : $ok_pos <br>";
                    break;
                }

                $punter += $limit - $retro;
                fseek($big_one,$punter);
            }
            fseek($big_one,0);
        }
    }

Thanks in advance!

567

asked Jun 09 '10 22:06

Marc

2 Answers

Why don't use exec + grep -b?

Click to copy

exec('grep "new" ext-all-debug.js -b', $result);
// here we have looked for "new" substring entries in the extjs debug src file
var_dump($result);

sample result:

Click to copy

array(1142) {
    [0]=>  string(97) "3398: * insert new elements. Revisiting the example above, we could utilize templating this time:"
    [1]=>  string(54) "3910:var tpl = new Ext.DomHelper.createTemplate(html);"
    ...
}

Each item consists of string offset in bytes from the start of file and the line itself, separated with colon.
So after this you have to look inside the particular line and append the position to the line offset. I.e.:

Click to copy

[0]=>  string(97) "3398: * insert new elements. Revisiting the example above, we could utilize templating this time:"

this means that "new" occurrence found at 3408th byte (3398 is the line position and 10 is the position of "new" inside this line)

156

answered Oct 12 '22 10:10

zerkms

Click to copy

$big_one    = fopen('big_one.txt','r');
$options    = fopen('options.txt','r');  

while(!feof($options))
{
  $option = trim(fgets($options));
  $position = substr($big_one,$option);

  if($position)
    return $position; //exit loop
}

the size of the file is quite large though. you might want to consider storing the data in a database instead. or if you absolutely can't, then use the grep solution posted here.

answered Oct 12 '22 10:10

Sev

Related questions
                            
                                safe place to save code
                            
                                change the absolute position from left to right on a span tag depending on class
                            
                                Export structure and data (like in PhpMyAdmin)
                            
                                Create subdomain upon user registration
                            
                                Using Shapefile data to determine neighborhood for a longitude/latitude
                            
                                If an PHP PDO transaction fails, must I rollback() explicitely?
                            
                                How to get public properties of a class?
                            
                                Incorrect comments set for php in vim
                            
                                Should I use unset in php __destruct()?
                            
                                PHP Array Key encoding?
                            
                                Possible to list all PHP classes and their methods and properties?
                            
                                ODBC query on MS SQL Server returning first 255 characters only in PHP PDO (FreeTDS)
                            
                                A Question About Embedding HTML In PHP
                            
                                How would you create a string of all UTF-8 characters?
                            
                                Php what does <<< mean?
                            
                                netbeans autocompletion when using singleton to retrieve object instead of new operator?
                            
                                FullCalendar not displaying time from JSON events
                            
                                php remove duplicates from array
                            
                                Is it possible to remove a Password from a PDF file using PHP?
                            
                                php simplexml_load_file() with password protected url

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

speed string search in PHP

Tags:

performance

string

php

search

Marc

People also ask

2 Answers

zerkms

Sev

Recent Activity

Donate For Us