Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory leakage in php with three for loops

My script is a spider that checks if a page is a "links page" or is a "information page". if the page is a "links page" then it continue in a recursive manner (or a tree if you will) until it finds the "information page".

I tried to make the script recursive and it was easy but i kept getting the error:

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 39 bytes) in /srv/www/loko/simple_html_dom.php on line 1316

I was told i would have to use the for loop method because no matter if i use the unset() function the script won't free memory and i only have three levels i need to loop through so it makes sense. But after i changed the script the error occurs again, but maybe i can free memory now?

Something needs to die here, please help me destruct someone!

set_time_limit(0);
ini_set('memory_limit', '256M');
require("simple_html_dom.php");
$thelink = "http://www.somelink.com";
$html1 = file_get_html($thelink);
$ret1 = $html1->find('#idTabResults2');

// first inception level, we know page has only links
if (!$ret1){
    $es1 = $html1->find('table.litab a');
    //unset($html1);
    $countlinks1 = 0;
    foreach ($es1 as $aa1) {
        $links1[$countlinks1] = $aa1->href;
        $countlinks1++;
    }
    //unset($es1);

    //for every link in array do the same
    for ($i = 0; $i < $countlinks1; $i++) {
        $html2 = file_get_html($links1[$i]);
        $ret2 = $html2->find('#idTabResults2');
        // if got information then send to DB
        if ($ret2){
            pullInfo($html2);
            //unset($html2);
        } else {
        // continue inception
            $es2 = $html2->find('table.litab a');
            $html2 = null;

            $countlinks2 = 0;
            foreach ($es2 as $aa2) {
            $links2[$countlinks2] = $aa2->href;
            $countlinks2++;
            }
            //unset($es2);

            for ($j = 0; $j < $countlinks2; $j++) {
                $html3 = file_get_html($links2[$j]);
                $ret3 = $html3->find('#idTabResults2');
                // if got information then send to DB       
                if ($ret3){
                    pullInfo($html3);

                } else {
                // inception level three
                    $es3 = $html3->find('table.litab a');
                    $html3 = null;
                    $countlinks3 = 0;
                    foreach ($es3 as $aa3) {
                        $links3[$countlinks3] = $aa3->href;
                        $countlinks3++;
                    }
                    for ($k = 0; $k < $countlinks3; $k++) {
                        echo memory_get_usage() ;
                        echo "\n";
                        $html4 = file_get_html($links3[$k]);
                        $ret4 = $html4->find('#idTabResults2');
                        // if got information then send to DB       
                        if ($ret4){
                            pullInfo($html4);

                        }
                        unset($html4);                  
                    }
                    unset($html3);
                }

            }
        }
    }
}



function pullInfo($html)
{

$tds = $html->find('td');
$count =0; 
foreach ($tds as $td) {
  $count++;
  if ($count==1){
    $name = html_entity_decode($td->innertext);
   }
  if ($count==2){
        $address = addslashes(html_entity_decode($td->innertext));
   }
  if ($count==3){
    $number = addslashes(preg_replace('/(\d+) - (\d+)/i', '$2$1', $td->innertext));
   }

}
unset($tds, $td);

$name = mysql_real_escape_string($name);
$address = mysql_real_escape_string($address);
$number = mysql_real_escape_string($number);
$inAlready=mysql_query("SELECT * FROM people WHERE phone=$number");
while($e=mysql_fetch_assoc($inAlready))
            $output[]=$e;
    if (json_encode($output) != "null"){ 
        //print(json_encode($output));
    } else {

mysql_query("INSERT INTO people (name, area, phone)
VALUES ('$name', '$address', '$number')");
}
}

And here is a picture of the growth in memory size: enter image description here

like image 940
Tom Avatar asked Mar 17 '26 14:03

Tom


1 Answers

I modified the code a little bit to free as much memory as I see could be freed. I've added a comment above each modification. The added comments start with "#" so you could find them easier. This is not related to this question, but worth mentioning that your database insertion code is vulnerable to SQL injection.

<?php
require("simple_html_dom.php");
$thelink = "http://www.somelink.co.uk";

# do not keep raw contents of the file on memory
#$data1 = file_get_contents($thelink);
#$html1 = str_get_html($data1);
$html1 = str_get_html(file_get_contents($thelink));

$ret1 = $html1->find('#idResults2');

// first inception level, we know page has only links
if (!$ret1){
    $es1 = $html1->find('table.litab a');

    # free $html1, not used anymore
    unset($html1);

    $countlinks1 = 0;
    foreach ($es1 as $aa1) {
        $links1[$countlinks1] = $aa1->href;
        $countlinks1++;
        // echo (addslashes($aa->href));
    }

    # free memroy used by the $es1 value, not used anymore
    unset($es1);

    //for every link in array do the same

    for ($i = 0; $i <= $countlinks1; $i++) {
        # do not keep raw contents of the file on memory
        #$data2 = file_get_contents($links1[$i]);
        #$html2 = str_get_html($data2);
        $html2 = str_get_html(file_get_contents($links1[$i]));

        $ret2 = $html2->find('#idResults2');

        // if got information then send to DB
        if ($ret2){
            pullInfo($html2);
        } else {
        // continue inception

            $es2 = $html2->find('table.litab a');

            # free memory used by $html2, not used anymore.
            # we would unset it at the end of the loop.
            $html2 = null;

            $countlinks2 = 0;
            foreach ($es2 as $aa2) {
                $links2[$countlinks2] = $aa2->href;
                $countlinks2++;
            }

            # free memory used by $es2
            unest($es2);

            for ($j = 0; $j <= $countlinks2; $j++) {
                # do not keep raw contents of the file on memory
                #$data3 = file_get_contents($links2[$j]);
                #$html3 = str_get_html($data3);
                $html3 = str_get_html(file_get_contents($links2[$j]));
                $ret3 = $html3->find('#idResults2');
                // if got information then send to DB   
                if ($ret3){
                    pullInfo($html3);
                }

                # free memory used by $html3 or on last iteration the memeory would net get free
                unset($html3);
            }
        }

        # free memory used by $html2 or on last iteration the memeory would net get free
        unset($html2);
    }
}



function pullInfo($html)
{
    $tds = $html->find('td');
    $count =0; 
    foreach ($tds as $td) {
      $count++;
      if ($count==1){
        $name = addslashes($td->innertext);
       }
      if ($count==2){
            $address = addslashes($td->innertext);
       }
      if ($count==3){
        $number = addslashes(preg_replace('/(\d+) - (\d+)/i', '$2$1', $td->innertext));
       }

    }

    # check for available data:
    if ($count) {
        # free $tds and $td
        unset($tds, $td);

        mysql_query("INSERT INTO people (name, area, phone)
        VALUES ('$name', '$address', '$number')");
    }

}

Update:

You could trace your memory usage to see how much memory is being used in each section of your code. this could be done by using the memory_get_usage() calls, and saving the result to some file. like placing this below code in the end of each of your loops, or before creating objects, calling heavy methods:

file_put_contents('memory.log', 'memory used in line ' . __LINE__ . ' is: ' . memory_get_usage() . PHP_EOL, FILE_APPEND);

So you could trace the memory usage of each part of your code.

In the end remember all this tracing and optimization might not be enough, since your application might really need more memory than 32 MB. I'v developed a system that analyzes several data sources and detects spammers, and then blocks their SMTP connections and since sometimes the number of connected users are over 30000, after a lot of code optimization, I had to increase the PHP memory limit to 768 MB on the server, Which is not a common thing to do.

like image 178
farzad Avatar answered Mar 19 '26 02:03

farzad



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!