I created a PowerShell script which loops over a large number of XML Schema (.xsd) files, and for each creates a .NET XmlSchemaSet
object, calls Add()
and Compile()
to add a schema to it, and prints out all validation errors.
This script works correctly, but there is a memory leak somewhere, causing it to consume gigabytes of memory if run on 100s of files.
What I essentially do in a loop is the following:
$schemaSet = new-object -typename System.Xml.Schema.XmlSchemaSet
register-objectevent $schemaSet ValidationEventHandler -Action {
...write-host the event details...
}
$reader = [System.Xml.XmlReader]::Create($schemaFileName)
[void] $schemaSet.Add($null_for_dotnet_string, $reader)
$reader.Close()
$schemaSet.Compile()
(A full script to reproduce this problem can be found in this gist: https://gist.github.com/3002649. Just run it, and watch the memory usage increase in Task Manager or Process Explorer.)
Inspired by some blog posts, I tried adding
remove-variable reader, schemaSet
I also tried picking up the $schema
from Add()
and doing
[void] $schemaSet.RemoveRecursive($schema)
These seem to have some effect, but still there is a leak. I'm presuming that older instances of XmlSchemaSet
are still using memory without being garbage collected.
The question: How do I properly teach the garbage collector that it can reclaim all memory used in the code above? Or more generally: how can I achieve my goal with a bounded amount of memory?
Microsoft has confirmed that this is a bug in PowerShell 2.0, and they state that this has been resolved in PowerShell 3.0.
The problem is that an event handler registered using Register-ObjectEvent is not garbage collected. In reponse to a support call, Microsoft said that
"we’re dealing with a bug in PowerShell v.2. The issue is caused actually by the fact that the .NET object instances are no longer released due to the event handlers not being released themselves. The issue is no longer reproducible with PowerShell v.3".
The best solution, as far as I can see, is to interface between PowerShell and .NET at a different level: do the validation completely in C# code (embedded in the PowerShell script), and just pass back a list of ValidationEventArgs
objects. See the fixed reproduction script at https://gist.github.com/3697081: that script is functionally correct and leaks no memory.
(Thanks to Microsoft Support for helping me find this solution.)
Initially Microsoft offered another workaround, which is to use $xyzzy = Register-ObjectEvent -SourceIdentifier XYZZY
, and then at the end do the following:
Unregister-Event XYZZY
Remove-Job $xyzzy -Force
However, this workaround is functionally incorrect. Any events that are still 'in flight' are lost at the time these two additional statements are executed. In my case, that means that I miss validation errors, so the output of my script is incomplete.
After the remove-variable
you can try to force GC collection :
[GC]::Collect()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With