Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP Garbage collection sucks or is it just me?

I have the function below which I call very frequently in a loop.

I waited 5 minutes as the memory climbed up from 1MB to 156MB. Should't PHP's garabage collector turn up and reduce this at some point?!

Is it because I have set memory limit at 256MB?

At echo point 2,3,4 its pretty constant memory usage. It goes down my half a meg at point 4. But point 1 is where the main memory increase happens. Probably because of file_get_html loading the html file in memory.

I though the clear and unset of the variable $html would take care of this?

function get_stuff($link, $category ){

    $html = file_get_html(trim("$link"));

    $article = $html->find('div[class=searchresultsWidget]', 0);

    echo '1 - > '.convert(memory_get_usage(true)).'<br />';  

    foreach($article->find('h4 a') as $link){

        $next_url = 'http://new.mysite.com'.$link->href;

        $font_name = trim($link->plaintext);        

        $html = file_get_html(trim("$next_url"));

        $article = $html->find('form[class=addtags]', 0);

        $font_tags = '';

        foreach($article->find('ul[class=everyone_tags] li a span') as $link){

            $font_tags .= trim($link->innertext).',';   

        }

        echo '2 - > '.convert(memory_get_usage(true)).'<br />'; 

        $font_name = mysql_real_escape_string($font_name);
        $category =  mysql_real_escape_string($category);  
        $font_tags = mysql_real_escape_string($font_tags);  

        $sql = "INSERT INTO tag_data (font_name, category, tags) VALUES ('$font_name', '$category', '$font_tags')";

        unset($font_tags);
        unset($font_name);
        unset($category); 

        $html->clear();   

        mysql_query($sql); 

        unset($sql);   

        echo '3 - > '.convert(memory_get_usage(true)).'<br />';    

} 

    unset($next_url);
    unset($link);
    $html->clear(); 
    unset($html);   
    unset($article);

    echo '4 - > '.convert(memory_get_usage(true)).'<br />';

}

As you can see, I attempted to make use of unset feebly. Although its no good as I understand it won't "unset" memory as soon as I call it.

Thanks all for any help on how I can reduce this upward rise of memory.

like image 640
Abs Avatar asked Aug 01 '10 17:08

Abs


People also ask

Is PHP garbage collected?

Thanks to PHP being an interpreted language and it that it has a garbage collector, PHP developers don't often have to think about memory management. Unlike developers in compiled languages, such as C/C++, we don't have to give that much thought to memory allocation and deallocation.

Is garbage collector necessary?

It is not strictly necessary. Given enough time and effort you can always translate a program that depends on garbage collection to one that doesn't.

Does garbage collection free memory?

The garbage collector provides the following benefits: Frees developers from having to manually release memory. Allocates objects on the managed heap efficiently. Reclaims objects that are no longer being used, clears their memory, and keeps the memory available for future allocations.

Is garbage collection based on reference?

The main concept that garbage collection algorithms rely on is the concept of reference. Within the context of memory management, an object is said to reference another object if the former has access to the latter (either implicitly or explicitly).


3 Answers

There's a known memory leak with file_get_html(): http://simplehtmldom.sourceforge.net/manual_faq.htm#memory_leak

The solution is to use

$html->clear();

Which you are doing, BUT: You're using $html both inside and outside of the loop. Inside the loop you are calling $html->clear(), and then near the end of your function $html->clear() again (I assume to catch your initial file_get_html() object reference). That last call doesn't do anything. You're leaking memory with the initial $html = file_get_html() call.

Try using a different variable ($html1, maybe?) inside your loop and see what happens.

like image 179
jasonbar Avatar answered Oct 15 '22 23:10

jasonbar


The purpose of the garbage collector is solely to catch circular references.

If there are none, the variables are immediately eliminated once their reference count hits 0.

I don't recommend that you use unset, except in exceptional cases. Use functions instead and rely on the variables to go out of scope to have the memory reclaimed.

Other than that, we can't possible describe to you what's exactly happing because we'd have to know exactly what the simple DOM parser is doing. Possibly there are circular references or global resources holding a reference, but it would be difficult to know.

See reference counting basics and collecting cycles.

like image 3
Artefacto Avatar answered Oct 15 '22 22:10

Artefacto


PHP didn't have a proper garbage collector until 5.3. It basically used only reference counting, which would leave circular references in place until the script terminated (e.g. $a =& $a is circular). As well, the cleanup code it DID have would only run if memory pressure required it to. e.g. no point in doing an expensive cleanup cycle if the newly freed memory wasn't needed.

As of 5.3, there's a proper garbage collector, and you can force it to run with gc_enable() and gc_collect_cycles().

like image 2
Marc B Avatar answered Oct 15 '22 23:10

Marc B