Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiency between pointer and array(less assembly instructions doesn't take less time) [duplicate]

Some people said: "Any operation that can be achieved by array subscripting can also be done with pointers. The pointer version will in general be faster".

I doubt about the outcome of the above, so I do the following test:

In the following article, We do not care compiler optimization. About compiler optimization how to affect the efficiency between pointer and array, please note: Efficiency: arrays vs pointers

(Visual Studio 2010, Debug Mode, no optimizations)

#include <windows.h>
#include <stdio.h>

int main()
{
    int a[] = {10,20,30};
    int* ap = a;

    long counter;

    int start_time, end_time;
    int index;

    start_time = GetTickCount();
    for (counter = 1000000000L; counter>0; counter--)
    {
        *(ap+1) = 100;
    }
    end_time = GetTickCount();
    printf("10 billion times of *ap = %d\n", end_time-start_time);

    start_time = GetTickCount();
    for (counter = 1000000000L; counter>0; counter--)
    {
        a[1] = 101;
    }
    end_time = GetTickCount();
    printf("10 billion times of a[0] = %d\n", end_time-start_time);

    return 0;
}

the result is:

10 billion times of *ap = 3276
10 billion times of a[0] = 3541

The pointer seems to be a little fast. But after I compared the dis-assemble, I fell into a deeper confusion。

(Visual Studio 2010, Debug Mode, no optimizations)

; 17   :         *(ap+1) = 100;
mov eax, DWORD PTR _ap$[ebp]
mov DWORD PTR [eax+4], 100          ; 00000064H

; 25   :         a[1] = 101;
mov DWORD PTR _a$[ebp+4], 101       ; 00000065H

From the assemble output, memory access through a pointer takes 2 instructions and array only takes 1 instruction.

Why array execute less instructions but it doesn't takes less time than pointer?

Does it associated with the cpu cache? How can I modify my test code to prove it?

like image 622
ajaxhe Avatar asked Apr 05 '13 13:04

ajaxhe


1 Answers

Firstly and most importantly, the C language doesn't have speed. That's an attribute introduced by implementations of C. For example, C doesn't have speed, but the GCC compiler produces code that might vary in speed from the code produced by Clang compiler, and they're both likely to produce code that out-performs the behavior produced by the Cint or Ch interpreters. All of these are C implementations. Some of them are slower than others, but the speed can't be attributed to C in anyway!

6.3.2.1 of the C standard says:

Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue.

This ought to be an indication that both *(ap+1) and a[1] in your code are pointer operations. This translation will happen during the compilation phase in Visual Studio. Hence, this shouldn't have any impact on runtime.

6.5.2.1 in regards to "array subscripting" says:

One of the expressions shall have type ‘‘pointer to complete object type’’, the other expression shall have integer type, and the result has type ‘‘type’’. This seems to indicate that the array subscript operator is actually a pointer operator...

This is confirmation that ap[1] is indeed a pointer operation, as we postulated earlier. At runtime, however, the array has already been translated to a pointer. The performance should be identical.

... so, why aren't they identical?

What are the characteristics of the OS you're using? Isn't it a multitasking, multi-user OS? Suppose the OS were to complete the first loop without interruption, but then interrupt the second loop and switch control to a different process. Wouldn't this interruption invalidate your experiment? How do you measure the frequency and timing of interruptions caused by task switching? Note that this will differ for different OSes, and the OS is part of the implementation.

What are the characteristics of the CPU you're using? Does it have it's own fast, internal cache for machine code? Suppose your entire first loop, and it's encompassing timing mechanism were to fit into the code cache nicely, but the second loop were truncated. Wouldn't this result in a cache miss, and a long wait while your CPU fetches the remainder of the code from RAM? How do you measure the timing of interruptions caused by cache misses? Note that this will differ for different CPUs, and the CPU is part of the implementation.

These questions ought to raise some questions such as "Does this micro-optimisation benchmark solve a meaningful or important problem?". The success of an optimisation will differ depending upon the size and complexity of the problem. Find an important problem, solve it, profile the solution, optimise it and profile it again. That way, you can give meaningful information about how much faster the optimised version is. Your boss will be much happier with you, providing you don't divulge that the optimisations are probably only relevant for your implementation, as I mentioned earlier. I'm certain you'll find that the least of your worries will be array dereference vs pointer dereference.

like image 95
autistic Avatar answered Nov 15 '22 10:11

autistic