Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP script memory leak issue

I'm running the PHP code below from command line. The issue is, its memory consumption far more than what it should be. I can't, for the life of me, figure out where the memory is getting consumed.

for ($i=0;$i<100;$i++)  
        {
            $classObject = $classObjects[$i];                       

            echo $i . "   :   " . memory_get_usage(true) . "\n";
            $classDOM = $scraper->scrapeClassInfo($classObject,$termMap,$subjectMap);           
            unset($classDOM);           
        }

According to me, the memory consumed by my script should remain more or less constant after every iteration of the loop. Any memory consumed by $scraper->scrapeClassInfo() should be freed when its members go out of scope.

This is the output file I get. For the sake of brevity, I'm showing every 10th line of the output:

0   :   5767168
10   :   12058624
20   :   18350080
30   :   24903680
40   :   30932992
50   :   37748736
60   :   43778048
70   :   49807360
80   :   55836672
90   :   62914560
97   :   66846720

Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 44 bytes) in /home/content/60/8349160/html/drexel/simple_html_dom.php on line 1255

Finally, as far as I can see, what $scraper->scrapeClassInfo() is doing should not really be the culprit, but just in case, here is the code:

function scrapeClassInfo($class,$termMap,$subjectMap)
        {
            $ckfile = tempnam ("/tmp", "CURLCOOKIE");
            $ckfile2 = tempnam ("/tmp", "CURLCOOKIE2");
            $ckfile3 = tempnam ("/tmp", "CURLCOOKIE3");         

            $termpage = $termMap[$class['termcode']];
            $subjectpage = $subjectMap[$class['subjectcode']];
            $classpage = $class['classlink'];

            //hit the main page and get cookie
            $ch = curl_init();
            curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($ch, CURLOPT_URL, $this->mainURL);
            curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
            curl_exec($ch);
            curl_close($ch);

            //hit the term page and get cookie
            $ch = curl_init();
            curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
            curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile2);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($ch, CURLOPT_URL, $termpage);
            curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
            curl_exec($ch);
            curl_close($ch);

            //hit the subject page and get cookie
            $ch = curl_init();
            curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile3);
            curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile2);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($ch, CURLOPT_URL, $subjectpage);
            curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
            curl_exec($ch);
            curl_close($ch);

            //hit the class page and scrape
            $ch = curl_init();              
            curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile3);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($ch, CURLOPT_URL, $classpage);
            curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
            $result = curl_exec($ch);
            curl_close($ch);

            return str_get_html($result);
        }

The method called in the last line, str_get_html() is a member of Simple HTML DOM Parser

Should it matter, this is how I am calling my script:

/usr/local/php5/bin/php index.php 2>&1 1>output

like image 270
xbonez Avatar asked Oct 10 '22 00:10

xbonez


2 Answers

Alright, figured it out. Apparently, its a bug that all version of PHP prior to 5.3 suffer from. Setting CURLOPT_RETURNTRANSFER to true causes massive memory leaks.

I ran the script again, this time invoking the php 5.3 binary:

/web/cgi-bin/php5_3 index.php 2>&1 1>output

And the output file reads:

0   :   6291456
10   :   9437184
20   :   10747904
30   :   11534336
40   :   11534336
50   :   11534336
60   :   11534336
70   :   11534336
80   :   11534336
90   :   11534336
99   :   11534336
152.74998211861 sec

Now that's what I'm talking about! Perfectly constant memory footprint.

like image 72
xbonez Avatar answered Oct 13 '22 10:10

xbonez


I found the following in your code.

  1. Remove curl_setopt($ch, CURLOPT_RETURNTRANSFER, true) as you are not capturing it.
  2. Do not close curl handle. Reuse it.

As a current workaround you can run the php script with higher memroy_limit

 $ php -d memory_limit=1G /path/to/script

1G means 1 Gigabyte.

like image 44
Shiplu Mokaddim Avatar answered Oct 13 '22 10:10

Shiplu Mokaddim