Consider a function that calls another function and checks for error. Assume that the function CheckError()
returns 0 on failure, and other numbers indicates success.
First version: taken branch for success or fall through to the error processing code (which is in the middle of the function).
CALL CheckError
TEST EAX,EAX ;check if return value is 0
JNZ Normal
ErrorProcessing:
... ;some error processing code here
Normal:
... ;some usual code here
Second version taken branch on error, or fall through to the normal path. The error processing code is at the end of the function.
CALL CheckError
TEST EAX,EAX
JZ ErrorProcessing
Normal:
... ;some usual code here
ErrorProcessing:
... ;some error processing code here
Which one of these two methods is better? Why?
Personally, I think first code has better code structure (more readable and programmable), because the code is compact. However, I also think that the second code has better speed normally (in the no-error case), because a not-taken conditional jump takes 2-3 clock cycles (maybe I'm too picky here) less than a taken one.
Anyway, I found that all compilers I tested use the first model when it compilers an if
statement. For instance:
if (GetActiveWindow() == NULL)
{
printf("Error: can't get window's handle.\n");
return -1;
}
printf("Succeed.\n");
return 0;
This should compile to (without any exe entry routine):
CALL [GetActiveWindow] ;if (GetActiveWindow() == NULL)
TEST EAX,EAX
JNZ CodeSucceed
;printf("Error.......\n"); return -1
PUSH OFFSET "Error.........\n"
CALL [Printf]
ADD ESP,4
OR EAX,0FFFFFFFFH
JMP Exit
CodeSucceed: ;printf("Succeed.\n"); return 0
PUSH OFFSET "Succeed.\n"
CALL [Printf]
ADD ESP,4
XOR EAX,EAX
Exit:
RETN
In terms of cycle counting on the conditional jump itself, which way you structure the code makes absolutely no difference. The only thing that matters anymore is whether the branch is predicted correctly. If it is, the branch costs zero cycles. If it is not, the branch costs tens or maybe even hundreds of cycles. The prediction logic in the hardware doesn't depend on which way the code is structured, and you have basically no control over it (CPU designers have experimented with "hints" but they turn out to be a net lose) (but see "Why is it faster to process a sorted array than an unsorted array?" for how high-level algorithmic decisions can make a huge difference).
However, there's another factor to consider: "hotness". If the "error processing" code will almost never actually get used, it is better to move it out of line — way out of line, to its own subsection of the executable image — so that it does not waste space in the I-cache. Making accurate decisions about when to do that is one of the most valuable benefits of profile-guided optimization — I'd guess second only to deciding on a per-function or even per-basic-block basis whether to optimize for space or speed.
Readability should be a primary concern when writing assembly by hand only if you are doing it as a learning exercise, or to implement something that can't be implemented in a higher level language (e.g. the guts of a context switch). If you are doing it because you need to squeeze cycles out of a critical inner loop, and it doesn't come out unreadable, you've probably got more cycle-squeezing to do.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With