I'm seeing the performance counter "# Induced GC" (which should stay at zero in a perfect app) increasing rapidly when processing small files (<= 32x32) via WriteableBitmap
.
While this isn't a significant bottleneck inside a small app, it becomes a very huge problem (app freezing at 99.75% "% Time in GC" for several seconds at each step) when there exist some thousand objects in memory (ex: EntityFramework
context loaded with many entities and relationships).
Synthetic test:
var objectCountPressure = (
from x in Enumerable.Range(65, 26)
let root = new DirectoryInfo((char)x + ":\\")
let subs =
from y in Enumerable.Range(0, 100 * IntPtr.Size)
let sub =new {DI = new DirectoryInfo(Path.Combine(root.FullName, "sub" + y)), Parent = root}
let files = from z in Enumerable.Range(0, 400) select new {FI = new FileInfo(Path.Combine(sub.DI.FullName, "file" + z)), Parent = sub}
select new {sub, files = files.ToList()}
select new {root, subs = subs.ToList()}
).ToList();
const int Size = 32;
Action<int> handler = threadnr => {
Console.WriteLine(threadnr + " => " + Thread.CurrentThread.ManagedThreadId);
for (int i = 0; i < 10000; i++) {
var wb = new WriteableBitmap(Size, Size, 96, 96, PixelFormats.Bgra32, null);
wb.Lock();
var stride = wb.BackBufferStride;
var blocks = stride / sizeof(int);
unsafe {
var row = (byte*)wb.BackBuffer;
for (int y = 0; y < wb.PixelHeight; y++, row += stride)
{
var start = (int*)row;
for (int x = 0; x < blocks; x++, start++)
*start = i;
}
}
wb.Unlock();
wb.Freeze(); }
};
var sw = Stopwatch.StartNew();
Console.WriteLine("start: {0:n3} ms", sw.Elapsed.TotalMilliseconds);
Parallel.For(0, Environment.ProcessorCount, new ParallelOptions{MaxDegreeOfParallelism = Environment.ProcessorCount}, handler);
Console.WriteLine("stop : {0:n2} s", sw.Elapsed.TotalSeconds);
GC.KeepAlive(objectCountPressure);
I can run this test using "const int Size = 48
" a dozen times: It always returns in ~1.5s and "# Induced GC" sometimes increases by 1 or 2.
When I change "const int Size = 48
" into "const int Size = 32
" then something very very bad is happening: "# Induced GC" increases by 10 per second and the overall runtime now is more than a minute: ~80s !
[Tested on Win7x64 Core-i7-2600 with 8GB RAM // .NET 4.0.30319.237 ]
WTF!?
Either the Framework has a very bad bug or I'm doing something entirely wrong.
BTW:
I came around this problem not by doing image processing but by just using a Tooltip containing an Image against some database entities via a DataTemplate:
This worked fine (fast) while there didn't exist very much objects in RAM -- but when there existed some million other objects (totally unrelated) then showing the Tooltip always delayed for several seconds, while everything else just was working fine.
TL;DR: Probably the best solution would be to create a small pool of WriteableBitmaps
and reuse them rather than creating them and throwing them away.
So I started spelunking around with WinDbg to see what was causing the collections to happen.
First I added a call to Debugger.Break()
to the start of Main
to make things easier. I also added my own call to GC.Collect()
as a sanity check to make sure my breakpoint worked ok. Then in WinDbg:
0:000> .loadby sos clr
0:000> !bpmd mscorlib.dll System.GC.Collect
Found 3 methods in module 000007feee811000...
MethodDesc = 000007feee896cb0
Setting breakpoint: bp 000007FEEF20E0C0 [System.GC.Collect(Int32)]
MethodDesc = 000007feee896cc0
Setting breakpoint: bp 000007FEEF20DDD0 [System.GC.Collect()]
MethodDesc = 000007feee896cd0
Setting breakpoint: bp 000007FEEEB74A80 [System.GC.Collect(Int32, System.GCCollectionMode)]
Adding pending breakpoints...
0:000> g
Breakpoint 1 hit
mscorlib_ni+0x9fddd0:
000007fe`ef20ddd0 4154 push r12
0:000> !clrstack
OS Thread Id: 0x49c (0)
Child SP IP Call Site
000000000014ed58 000007feef20ddd0 System.GC.Collect()
000000000014ed60 000007ff00140388 ConsoleApplication1.Program.Main(System.String[])
So the breakpoint worked OK, but when I let the program continue it was never hit again. It seemed the GC routine was being called from somewhere deeper. Next I stepped into the GC.Collect()
function to see what it was calling. To do this more easily I added a second call to GC.Collect()
immediately after the first and stepped into the second one. This avoided stepping through all the JIT compilation:
Breakpoint 1 hit
mscorlib_ni+0x9fddd0:
000007fe`ef20ddd0 4154 push r12
0:000> p
mscorlib_ni+0x9fddd2:
000007fe`ef20ddd2 4155 push r13
0:000> p
...
0:000> p
mscorlib_ni+0x9fde00:
000007fe`ef20de00 4c8b1d990b61ff mov r11,qword ptr [mscorlib_ni+0xe9a0 (000007fe`ee81e9a0)] ds:000007fe`ee81e9a0={clr!GCInterface::Collect (000007fe`eb976100)}
After a little stepping I noticed a reference to clr!GCInterface::Collect
which sounded promising. Unfortunately a breakpoint on it never triggered. Digging further into GC.Collect()
I found clr!WKS::GCHeap::GarbageCollect
which proved to be the real method. A breakpoint on this revealed the code that was triggering the collection:
0:009> bp clr!WKS::GCHeap::GarbageCollect
0:009> g
Breakpoint 4 hit
clr!WKS::GCHeap::GarbageCollect:
000007fe`eb919490 488bc4 mov rax,rsp
0:006> !clrstack
OS Thread Id: 0x954 (6)
Child SP IP Call Site
0000000000e4e708 000007feeb919490 [NDirectMethodFrameStandalone: 0000000000e4e708] System.GC._AddMemoryPressure(UInt64)
0000000000e4e6d0 000007feeeb9d4f7 System.GC.AddMemoryPressure(Int64)
0000000000e4e7a0 000007fee9259a4e System.Windows.Media.SafeMILHandle.UpdateEstimatedSize(Int64)
0000000000e4e7e0 000007fee9997b97 System.Windows.Media.Imaging.WriteableBitmap..ctor(Int32, Int32, Double, Double, System.Windows.Media.PixelFormat, System.Windows.Media.Imaging.BitmapPalette)
0000000000e4e8e0 000007ff00141f92 ConsoleApplication1.Program.<Main>b__c(Int32)
So WriteableBitmap
's constructor indirectly calls GC.AddMemoryPressure, which eventually results in collections (incidentally, GC.AddMemoryPressure
is an easier way to simulate memory usage). This doesn't explain the sudden change in behaviour when going from a size of 33 to 32 though.
ILSpy helps out here. In particular, if you look at the constructor for SafeMILHandleMemoryPressure
(invoked by SafeMILHandle.UpdateEstimatedSize
) you'll see that it only uses GC.AddMemoryPressure
if the pressure to add is <= 8192. Otherwise it uses its own custom system for tracking memory pressure and triggering collections. A bitmap size of 32x32 with 32-bit pixels falls under this limit because WriteableBitmap
estimates the memory use as 32 * 32 * 4 * 2 (I'm not sure why the extra factor of 2 is there).
In summary, it looks like the behaviour you're seeing is the result of a heuristic in the framework that doesn't work so well for your case. You might be able to work around it by creating a bitmap with bigger dimensions or a bigger pixel format than you need so that the estimated memory size of the bitmap is > 8192.
Afterthought: I guess this also suggests that collections triggered as a result of GC.AddMemoryPressure
are counted under "# Induced GC"?
Under all of the SafeMILHandleMemoryPressure
and SafeMILHandle
nonsense is a call to a method on MS.Internal.MemoryPressure
, which uses the static field "_totalMemory
" to keep track of how much memory WPF thinks is allocated. When it hits a (rather small) limit, the induced GCs start and never end.
You can stop WPF from behaving this way entirely using a little reflection magic; just set _totalMemory
to something appropriately negative so the limit is never reached and the induced GCs never happen:
typeof(BitmapImage).Assembly.GetType("MS.Internal.MemoryPressure")
.GetField("_totalMemory", BindingFlags.NonPublic | BindingFlags.Static)
.SetValue(null, Int64.MinValue / 2);
Running Markus' code on Win7 x86 (T4300, 2.1GHz, 3GB):
(notice the huge difference between 33 and 32)
Is64BitOperatingSystem: False
Is64BitProcess: False
Version: 4.0.30319.237
Running Test with 40: 3,20 s
Running Test with 34: 1,14 s
Running Test with 33: 1,06 s
Running Test with 32: 64,41 s
Running Test with 30: 53,32 s
Running Test with 24: 29,01 s
Another machine Win7 x64 (Q9550, 2.8GHz, 8GB):
Is64BitOperatingSystem: True
Is64BitProcess: False
Version: 4.0.30319.237
Running Test with 40: 1,41 s
Running Test with 34: 1,24 s
Running Test with 33: 1,19 s
Running Test with 32: 1.554,45 s
Running Test with 30: 1.489,31 s
Running Test with 24: 842,66 s
Once again with 40: 7,21 s
The Q9550 CPU has a lot more power than the T4300, but it's running on 64bit OS.
This seems to slow down the whole thing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With