I've compiled the following using Visual Studio C++ 2008 SP1, x64
C++
compiler:
I'm curious, why did compiler add those nop
instructions after those call
s?
PS1. I would understand that the 2nd and 3rd nop
s would be to align the code on a 4 byte margin, but the 1st nop
breaks that assumption.
PS2. The C++ code that was compiled had no loops or special optimization stuff in it:
CTestDlg::CTestDlg(CWnd* pParent /*=NULL*/)
: CDialog(CTestDlg::IDD, pParent)
{
m_hIcon = AfxGetApp()->LoadIcon(IDR_MAINFRAME);
//This makes no sense. I used it to set a debugger breakpoint
::GdiFlush();
srand(::GetTickCount());
}
PS3. Additional Info: First off, thank you everyone for your input.
Here's additional observations:
My first guess was that incremental linking could've had something to do with it. But, the Release
build settings in the Visual Studio
for the project have incremental linking
off.
This seems to affect x64
builds only. The same code built as x86
(or Win32
) does not have those nop
s, even though instructions used are very similar:
x64
code produced by VS 2013
looks somewhat different, it still adds those nop
s after some call
s:dynamic
vs static
linking to MFC made no difference on presence of those nop
s. This one is built with dynamical linking to MFC dlls with VS 2013
:nop
s can appear after near
and far
call
s as well, and they have nothing to do with alignment. Here's a part of the code that I got from IDA
if I step a little bit further on:As you see, the nop
is inserted after a far
call
that happens to "align" the next lea
instruction on the B
address! That makes no sense if those were added for alignment only.
near
relative
call
s (i.e. those that start with E8
) are somewhat faster than far
call
s (or the ones that start with FF
,15
in this case)the linker may try to go with near
call
s first, and since those are one byte shorter than far
call
s, if it succeeds, it may pad the remaining space with nop
s at the end. But then the example (5) above kinda defeats this hypothesis.
So I still don't have a clear answer to this.
Usually nop s inside functions are to align branch targets, including function entry points like in the question Brian linked. (Also see -falign-loops in the gcc docs, which is on by default at optimization levels other than -Os ).
A NOP is most commonly used for timing purposes, to force memory alignment, to prevent hazards, to occupy a branch delay slot, to render void an existing instruction such as a jump, as a target of an execute instruction, or as a place-holder to be replaced by active instructions later on in program development (or to ...
NOP is a mnemonic that stands for “No Operation”. This instruction does nothing during execution. Only it occupied 1-Byte of memory space and spends 4-Machine Cycles. NOP instruction can be used to create small-time delay in the execution of the code.
NOPs serve several purposes: They allow the debugger to place a breakpoint on a line even if it is combined with others in the generated code. It allows the loader to patch a jump with a different-sized target offset. It allows a block of code to be aligned at a particular boundary, which can be good for caching.
This is purely a guess, but it might be some kind of a SEH optimization. I say optimization because SEH seems to work fine without the NOPs too. NOP might help speed up unwinding.
In the following example (live demo with VC2017), there is a NOP
inserted after a call to basic_string::assign
in test1
but not in test2
(identical but declared as non-throwing1).
#include <stdio.h>
#include <string>
int test1() {
std::string s = "a"; // NOP insterted here
s += getchar();
return (int)s.length();
}
int test2() throw() {
std::string s = "a";
s += getchar();
return (int)s.length();
}
int main()
{
return test1() + test2();
}
Assembly:
test1:
. . .
call std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign
npad 1 ; nop
call getchar
. . .
test2:
. . .
call std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign
call getchar
Note that MSVS compiles by default with the /EHsc
flag (synchronous exception handling). Without that flag the NOP
s disappear, and with /EHa
(synchronous and asynchronous exception handling), throw()
no longer makes a difference because SEH is always on.
1 For some reason only throw()
seems to reduce the code size, using noexcept
makes the generated code even bigger and summons even more NOP
s. MSVC...
This is special filler to let exception handler/unwinding function to detect correctly whether it's prologue/epilogue/body of the function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With