Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the fall-through side of a conditional branch more efficient? Is it a good idea to make that the error-handling side?

Consider a function that calls another function and checks for error. Assume that the function CheckError() returns 0 on failure, and other numbers indicates success.

First version: taken branch for success or fall through to the error processing code (which is in the middle of the function).

    CALL   CheckError
    TEST   EAX,EAX    ;check if return value is 0
    JNZ    Normal
ErrorProcessing:
    ...    ;some error processing code here
Normal:
    ...    ;some usual code here

Second version taken branch on error, or fall through to the normal path. The error processing code is at the end of the function.

    CALL   CheckError
    TEST   EAX,EAX
    JZ     ErrorProcessing
Normal:
    ...    ;some usual code here
ErrorProcessing:
    ...    ;some error processing code here

Which one of these two methods is better? Why?

Personally, I think first code has better code structure (more readable and programmable), because the code is compact. However, I also think that the second code has better speed normally (in the no-error case), because a not-taken conditional jump takes 2-3 clock cycles (maybe I'm too picky here) less than a taken one.

Anyway, I found that all compilers I tested use the first model when it compilers an if statement. For instance:

if (GetActiveWindow() == NULL)
{
    printf("Error: can't get window's handle.\n");
    return -1;
}
printf("Succeed.\n");
return 0;

This should compile to (without any exe entry routine):

    CALL [GetActiveWindow]    ;if (GetActiveWindow() == NULL)
    TEST EAX,EAX
    JNZ CodeSucceed
                             ;printf("Error.......\n"); return -1
    PUSH OFFSET "Error.........\n"
    CALL [Printf]
    ADD ESP,4
    OR EAX,0FFFFFFFFH
    JMP Exit

CodeSucceed:                 ;printf("Succeed.\n"); return 0
    PUSH OFFSET "Succeed.\n"
    CALL [Printf]
    ADD ESP,4
    XOR EAX,EAX
Exit:
    RETN
like image 910
J.Smith Avatar asked Sep 05 '25 03:09

J.Smith


1 Answers

In terms of cycle counting on the conditional jump itself, which way you structure the code makes absolutely no difference. The only thing that matters anymore is whether the branch is predicted correctly. If it is, the branch costs zero cycles. If it is not, the branch costs tens or maybe even hundreds of cycles. The prediction logic in the hardware doesn't depend on which way the code is structured, and you have basically no control over it (CPU designers have experimented with "hints" but they turn out to be a net lose) (but see "Why is it faster to process a sorted array than an unsorted array?" for how high-level algorithmic decisions can make a huge difference).

However, there's another factor to consider: "hotness". If the "error processing" code will almost never actually get used, it is better to move it out of line — way out of line, to its own subsection of the executable image — so that it does not waste space in the I-cache. Making accurate decisions about when to do that is one of the most valuable benefits of profile-guided optimization — I'd guess second only to deciding on a per-function or even per-basic-block basis whether to optimize for space or speed.

Readability should be a primary concern when writing assembly by hand only if you are doing it as a learning exercise, or to implement something that can't be implemented in a higher level language (e.g. the guts of a context switch). If you are doing it because you need to squeeze cycles out of a critical inner loop, and it doesn't come out unreadable, you've probably got more cycle-squeezing to do.

like image 164
zwol Avatar answered Sep 07 '25 19:09

zwol