Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delphi SampleProfiler: How is this code calling into ntdll.dll?

i profiled a portion of my application using the Delphi Sampling Profiler. Like most people, i see a majority of the time spent inside ntdll.dll.

Note: i turned on the options to ignore Application.Idle time, and calls from System.pas. So it isn't inside ntdll because the application is idle:

alt text

After multiple runs, multiple times, the majority of the time seems to be spent inside ntdll.dll, but the odd thing is who the caller is:

enter image description here

The caller is from the Virtual Treeview's:

PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);    

Note: The application is not inside ntdll.dll because the application is idle, because the caller isn't Application.Idle.

What confuses me is that it's this line itself (i.e. not something inside PrepareCell) is the caller into ntdll. Even more confusing is that:

  • not only is it not something inside PrepareCell()
  • it's not even the setup of PrepareCell (e.g. popping stack variables, setting up implicit exception frames, etc) that is the caller. Those things would show up in the profiler as a hotspot on the begin inside PrepareCell.

VirtualTrees.pas:

procedure TBaseVirtualTree.PrepareCell(var PaintInfo: TVTPaintInfo; WindowOrgX, MaxWidth: Integer);
begin
   ...
end;

So i'm trying to figure out how this line:

PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);    

is calling ntdll.dll.


The only other ways in are the three parameters:

  • PaintInfo
  • Window.Left
  • NodeBitmap.Width

Maybe one of those is a function, or a property getter, that would call into ntdll. So i put a breakpoint on the line, and look at the CPU window at runtime:

alt text

There is a line in there that might be the culprit:

call dword ptr [edx+$2c]

But when i follow that jump, it doesn't end up in ntdll.dll, but TBitmap.GetWidth:

alt text

Which, as you can see, doesn't call anywhere; and certainly not into ntdll.dll.


So how is the line:

PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);    

calling into ntdll.dll?


Note: i know full well it isn't really calling into ntdll.dll. So any valid answer will have to include the words "Sampling Profiler is misleading; that line is not calling into ntdll.dll." The answer will also have to either say that the majority of the time is not spent in ntdll.dll, or that the highlighted line is not the caller. Finally any answer will have to explain why Sampling Profiler is wrong, and how it can be fixed.

Update 2

What is ntdll.dll? Ntdll is Windows NT's native API set. The Win32 API is a wrapper around ntdll.dll that looks like the Windows API that existed in Windows 1/2/3/9x. In order to actually get into ntdll you have to call a function that uses ntdll directly or indirectly.

For example, when my Delphi application goes idle, it waits for a message by calling the user32.dll function:

WaitMessage;

When when you actually look at it is:

USER32.WaitMessage
  mov eax,$00001226
  mov edx,$7ffe0300
  call dword ptr [edx]
  ret

Calling the function specified at $7ffe0300 is the way Windows transitions into Ring0, calling the FunctionID specified in EAX. In this case, the System Function being called is 0x1226. On my operating system, Windows Vista, 0x1226 corresponds to the system function NtUserWaitMessage.

This is how to you get into ntdll.dll: you call it.

i was desperately trying to avoid a hand-waving non-answer when i worded the original question. By being very specific, carefully pointing out the reality of what i'm seeing, i was trying to prevent people from ignoring the facts, and trying to use a hand-waving argument.


Update Three

i converted the two parameters:

PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);

into stack variables:

_profiler_WindowLeft := Window.Left;
_profiler_NodeBitmapWidth := NodeBitmap.Width;
PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);

To confirm that the bottleneck is not is the call to

  • Windows.Left, or
  • Nodebitmap.Width

Profiler still indicates that the line

PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);

itself is the bottleneck; not anything inside PrepareCell. This must mean that it's something inside the setup of the call to prepare cell, or at the start of PrepareCell:

VirtualTrees.pas.15746: PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);
   mov eax,[ebp-$54]
   push eax
   mov edx,esi
   mov ecx,[ebp-$50]
   mov eax,[ebp-$04]
   call TBasevirtualTree.PrepareCell

Nothing in that calls into ntdll. Now the pre-amble in PrepareCell itself:

VirtualTrees.pas.15746: begin
   push ebp
   mov ebp,esp
   add esp,-$44
   push ebx
   push esi
   push edi
   mov [ebp-$14],ecx
   mov [ebp-$18],edx
   mov [ebp-$1c],eax
   lea esi,[ebp-$1c]
   mov edi,[ebp-$18]

Nothing in there calls into ntdll.dll.


The questions still remain:

  • why is pushing of one variable onto the stack, and two others into registers the bottleneck?
  • why isn't anything inside PrepareCell itself the bottleneck?
like image 393
Ian Boyd Avatar asked Apr 12 '10 15:04

Ian Boyd


2 Answers

Well, this problem was actually my main reason to make my own sampling profiler:
http://code.google.com/p/asmprofiler/wiki/AsmProfilerSamplingMode

Maybe not perfect, but you could give it a try. Let me know what you think about it.

Btw, I think it has to do with the fact that almost all calls ends into calls to the kernel (memory requests, paint events, etc). Only calculations do not need to call the kernel. Most calls ends in waiting for kernel results:

ntdll.dll!KiFastSystemCallRet

You can see this in Process Explorer with thread stack view, or in Delphi, or using StackWalk64 API in my "Live view" of AsmProfiler:
http://code.google.com/p/asmprofiler/wiki/ProcessStackViewer

like image 90
André Avatar answered Oct 14 '22 00:10

André


There are probably two things happening there.

The first is that SamplingProfiler identifies the caller by walking up the stack, until it encounters what looks like a valid call point into Delphi from Delphi code.

The thing is, some procedures may reserve a large amount of stack at once, without reinitializing it. This could result in a false positive. The only clue then would be that your false positive was recently invoked.

The second thing is the ntdll localization, that is known for certain, however, ntdll is your wait point in user-space, and as user197220, ntdll is where you'll end up waiting most of the time you're calling system stuff and waiting for the result.

In your case, unless you reduced the sampling rate, you're looking at 247ms of CPU work time, which could probably pass as idle if those 247 samples were collected over many seconds of real time. Since the false positive points to VirtualTree paint preparations, my bet would be that the ntdll time is actually paint time (driver or OS software). You can try commenting out the code that actually does the painting to be sure.

like image 21
Eric Grange Avatar answered Oct 13 '22 22:10

Eric Grange