Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to instruct PowerShell to garbage collect .NET objects like XmlSchemaSet?

I created a PowerShell script which loops over a large number of XML Schema (.xsd) files, and for each creates a .NET XmlSchemaSet object, calls Add() and Compile() to add a schema to it, and prints out all validation errors.

This script works correctly, but there is a memory leak somewhere, causing it to consume gigabytes of memory if run on 100s of files.

What I essentially do in a loop is the following:

$schemaSet = new-object -typename System.Xml.Schema.XmlSchemaSet
register-objectevent $schemaSet ValidationEventHandler -Action {
    ...write-host the event details...
}
$reader = [System.Xml.XmlReader]::Create($schemaFileName)
[void] $schemaSet.Add($null_for_dotnet_string, $reader)
$reader.Close()
$schemaSet.Compile()

(A full script to reproduce this problem can be found in this gist: https://gist.github.com/3002649. Just run it, and watch the memory usage increase in Task Manager or Process Explorer.)

Inspired by some blog posts, I tried adding

remove-variable reader, schemaSet

I also tried picking up the $schema from Add() and doing

[void] $schemaSet.RemoveRecursive($schema)

These seem to have some effect, but still there is a leak. I'm presuming that older instances of XmlSchemaSet are still using memory without being garbage collected.

The question: How do I properly teach the garbage collector that it can reclaim all memory used in the code above? Or more generally: how can I achieve my goal with a bounded amount of memory?

like image 410
MarnixKlooster ReinstateMonica Avatar asked Jun 27 '12 09:06

MarnixKlooster ReinstateMonica


2 Answers

Microsoft has confirmed that this is a bug in PowerShell 2.0, and they state that this has been resolved in PowerShell 3.0.

The problem is that an event handler registered using Register-ObjectEvent is not garbage collected. In reponse to a support call, Microsoft said that

"we’re dealing with a bug in PowerShell v.2. The issue is caused actually by the fact that the .NET object instances are no longer released due to the event handlers not being released themselves. The issue is no longer reproducible with PowerShell v.3".

The best solution, as far as I can see, is to interface between PowerShell and .NET at a different level: do the validation completely in C# code (embedded in the PowerShell script), and just pass back a list of ValidationEventArgs objects. See the fixed reproduction script at https://gist.github.com/3697081: that script is functionally correct and leaks no memory.

(Thanks to Microsoft Support for helping me find this solution.)


Initially Microsoft offered another workaround, which is to use $xyzzy = Register-ObjectEvent -SourceIdentifier XYZZY, and then at the end do the following:

Unregister-Event XYZZY
Remove-Job $xyzzy -Force

However, this workaround is functionally incorrect. Any events that are still 'in flight' are lost at the time these two additional statements are executed. In my case, that means that I miss validation errors, so the output of my script is incomplete.

like image 74
MarnixKlooster ReinstateMonica Avatar answered Sep 28 '22 18:09

MarnixKlooster ReinstateMonica


After the remove-variable you can try to force GC collection :

[GC]::Collect()
like image 20
CB. Avatar answered Sep 28 '22 16:09

CB.