Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

No garbage collection while PowerShell pipeline is executing

UPDATE: The following bug seems to be resolved with PowerShell 5. The bug remains in 3 and 4. So don't process any huge files with the pipeline unless you're running PowerShell 2 or 5.


Consider the following code snippet:

function Get-DummyData() {
    for ($i = 0; $i -lt 10000000; $i++) {
        "This is freaking huge!! I'm a ninja! More words, yay!"
    }
}

Get-DummyData | Out-Null

This will cause PowerShell memory usage to grow uncontrollably. After executing Get-DummyData | Out-Null a few times, I have seen PowerShell memory usage get all the way up to 4 GB.

According to ANTS Memory Profiler, we have a whole lot of things sitting around in the garbage collector's finalization queue. When I call [GC]::Collect(), the memory goes from 4 GB to a mere 70 MB. So we don't have a memory leak, strictly speaking.

Now, it's not good enough for me to be able to call [GC]::Collect() when I'm finished with a long-lived pipeline operation. I need garbage collection to happen during a pipeline operation. However if I try to invoke [GC]::Collect() while the pipeline is executing...

function Get-DummyData() {
    for ($i = 0; $i -lt 10000000; $i++) {
        "This is freaking huge!! I'm a ninja! More words, yay!"

        if ($i % 1000000 -eq 0) {
            Write-Host "Prompting a garbage collection..."
            [GC]::Collect()
        }
    }
}

Get-DummyData | Out-Null

... the problem remains. Memory usage grows uncontrollably again. I have tried several variations of this, such as adding [GC]::WaitForPendingFinalizers(), Start-Sleep -Seconds 10, etc. I have tried changing garbage collector latency modes and forcing PowerShell to use server garbage collection to no avail. I just can't get the garbage collector to do its thing while the pipeline is executing.

This isn't a problem at all in PowerShell 2.0. It's also interesting to note that $null = Get-DummyData also seems to work without memory issues. So it seems tied to the pipeline, rather than the fact that we're generating tons of strings.

How can I prevent my memory from growing uncontrollably during long pipelines?

Side note:

My Get-DummyData function is only for demonstration purposes. My real-world problem is that I'm unable to read through large files in PowerShell using Get-Content or Import-Csv. No, I'm not storing the contents of these files in variables. I'm strictly using the pipeline like I'm supposed to. Get-Content .\super-huge-file.txt | Out-Null produces the same problem.

like image 745
Phil Avatar asked Jul 24 '15 22:07

Phil


People also ask

Does PowerShell have garbage collection?

There is a known bug where PowerShell does not correctly manage a garbage collection whilst executing a pipeline or loop of an object. Simply using [System. GC]::Collect() within the pipeline or loop does not work as expected.

How do you make sure that the garbage collector is done running when you call GC collect ()?

To have the garbage collector consider all objects regardless of their generation, use the version of this method that takes no parameters. To have the garbage collector reclaim objects based on a GCCollectionMode setting, use the GC. Collect(Int32, GCCollectionMode) method overload.

When you request garbage collector to run it will start to run immediately?

It is not possible to run garbage collection immediately as we are programming in managed environment and only CLR decides when to run garbage collection.


1 Answers

A couple of things to point out here. First, GC calls do work in the pipeline. Here's a pipeline script that only invokes the GC:

1..10 | Foreach {[System.GC]::Collect()}

Here's the perfmon graph of GCs during the time the script ran:

enter image description here

However, just because you invoke the GC it doesn't mean the private memory usage will return to the value you had before your script started. A GC collect will only collect memory that is no longer used. If there is a rooted reference to an object, it is not eligible to be collected (freed). So while GC systems typically don't leak in the C/C++ sense, they can have memory hoards that hold onto objects longer than perhaps they should.

In looking at this with a memory profiler it seems the bulk of the excess memory is taken up by a copy of the string with parameter binding info:

enter image description here

The root for these strings look like this:

enter image description here

I wonder if there is some logging feature that is causing PowerShell to hang onto a string-ized form pipeline bound objects?

BTW in this specific case, it is much more memory efficient to assign to $null to ignore the output:

$null = GetDummyData

Also, if you need to simply edit a file, check out the Edit-File command in the PowerShell Community Extensions 3.2.0. It should be memory efficient as long as you don't use the SingleString switch parameter.

like image 163
Keith Hill Avatar answered Sep 21 '22 13:09

Keith Hill