I have a 1.2GB file that contains a one line string. What I need is to search the entire file to find the position of an another string (currently I have a list of strings to search). The way what I'm doing it now is opening the big file and move a pointer throught 4Kb blocks, then moving the pointer X positions back in the file and get 4Kb more.
My problem is that a bigger string to search, a bigger time he take to got it.
Can you give me some ideas to optimize the script to get better search times?
this is my implementation:
function busca($inici){
$limit = 4096;
$big_one = fopen('big_one.txt','r');
$options = fopen('options.txt','r');
while(!feof($options)){
$search = trim(fgets($options));
$retro = strlen($search);//maybe setting this position absolute? (like 12 or 15)
$punter = 0;
while(!feof($big_one)){
$ara = fgets($big_one,$limit);
$pos = strpos($ara,$search);
$ok_pos = $pos + $punter;
if($pos !== false){
echo "$pos - $punter - $search : $ok_pos <br>";
break;
}
$punter += $limit - $retro;
fseek($big_one,$punter);
}
fseek($big_one,0);
}
}
Thanks in advance!
Simple text searching with strstr() PHP's strstr() function simply takes a string to search, and a chunk of text to search for. If the text was found, it returns the portion of the string from the first character of the match up to the end of the string: $myString = 'Hello, there!';
The strpos() function finds the position of the first occurrence of a string inside another string.
strpos() Function: This function helps us to find the position of the first occurrence of a string in another string. This returns an integer value of the position of the first occurrence of the string. This function is case-sensitive, which means that it treats upper-case and lower-case characters differently.
Using str_contains. The str_contains is a new function that was introduced in PHP 8. This method is used to check if a PHP string contains a substring. The function checks the string and returns a boolean true in case it exists and false otherwise.
Why don't use exec
+ grep -b
?
exec('grep "new" ext-all-debug.js -b', $result);
// here we have looked for "new" substring entries in the extjs debug src file
var_dump($result);
sample result:
array(1142) {
[0]=> string(97) "3398: * insert new elements. Revisiting the example above, we could utilize templating this time:"
[1]=> string(54) "3910:var tpl = new Ext.DomHelper.createTemplate(html);"
...
}
Each item consists of string offset in bytes from the start of file and the line itself, separated with colon.
So after this you have to look inside the particular line and append the position to the line offset. I.e.:
[0]=> string(97) "3398: * insert new elements. Revisiting the example above, we could utilize templating this time:"
this means that "new" occurrence found at 3408th byte (3398 is the line position and 10 is the position of "new" inside this line)
$big_one = fopen('big_one.txt','r');
$options = fopen('options.txt','r');
while(!feof($options))
{
$option = trim(fgets($options));
$position = substr($big_one,$option);
if($position)
return $position; //exit loop
}
the size of the file is quite large though. you might want to consider storing the data in a database instead. or if you absolutely can't, then use the grep solution posted here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With