Accessing array values via pointer arithmetic vs. subscripting in C

Tags:

I keep reading that, in C, using pointer arithmetic is generally faster than subscripting for array access. Is this true even with modern (supposedly-optimizing) compilers?

If so, is this still the case as I begin to move away from learning C into Objective-C and Cocoa on Macs?

Which is the preferred coding style for array access, in both C and Objective-C? Which is considered (by professionals of their respective languages) more legible, more "correct" (for lack of a better term)?

590

asked Oct 24 '08 11:10

John Rudy

1 Answers

You need to understand the reason behind this claim. Have you ever questioned yourself why it is faster? Let's compare some code:

int i; int a[20];  // Init all values to zero memset(a, 0, sizeof(a)); for (i = 0; i < 20; i++) {     printf("Value of %d is %d\n", i, a[i]); }

They are all zero, what a surprise :-P The question is, what means a[i] actually in low level machine code? It means

Take the address of a in memory.
Add i times the size of a single item of a to that address (int usually is four bytes).
Fetch the value from that address.

So each time you fetch a value from a, the base address of a is added to the result of the multiplication of i by four. If you just dereference a pointer, step 1. and 2. don't need to be performed, only step 3.

Consider the code below.

int i; int a[20]; int * b;  memset(a, 0, sizeof(a)); b = a; for (i = 0; i < 20; i++) {     printf("Value of %d is %d\n", i, *b);     b++; }

This code might be faster... but even if it is, the difference is tiny. Why might it be faster? "*b" is the same as step 3. of above. However, "b++" is not the same as step 1. and step 2. "b++" will increase the pointer by 4.

(important for newbies: running ++ on a pointer will not increase the pointer one byte in memory! It will increase the pointer by as many bytes in memory as the data it points to is in size. It points to an int and the int is four bytes on my machine, so b++ increases b by four!)

Okay, but why might it be faster? Because adding four to a pointer is faster than multiplying i by four and adding that to a pointer. You have an addition in either case, but in the second one, you have no multiplication (you avoid the CPU time needed for one multiplication). Considering the speed of modern CPUs, even if the array was 1 mio elements, I wonder if you could really benchmark a difference, though.

That a modern compiler can optimize either one to be equally fast is something you can check by looking at the assembly output it produces. You do so by passing the "-S" option (capital S) to GCC.

Here's the code of first C code (optimization level -Os has been used, which means optimize for code size and speed, but don't do speed optimizations that will increase code size noticeably, unlike -O2 and much unlike -O3):

_main:     pushl   %ebp     movl    %esp, %ebp     pushl   %edi     pushl   %esi     pushl   %ebx     subl    $108, %esp     call    ___i686.get_pc_thunk.bx "L00000000001$pb":     leal    -104(%ebp), %eax     movl    $80, 8(%esp)     movl    $0, 4(%esp)     movl    %eax, (%esp)     call    L_memset$stub     xorl    %esi, %esi     leal    LC0-"L00000000001$pb"(%ebx), %edi L2:     movl    -104(%ebp,%esi,4), %eax     movl    %eax, 8(%esp)     movl    %esi, 4(%esp)     movl    %edi, (%esp)     call    L_printf$stub     addl    $1, %esi     cmpl    $20, %esi     jne L2     addl    $108, %esp     popl    %ebx     popl    %esi     popl    %edi     popl    %ebp     ret

Same with the second code:

_main:     pushl   %ebp     movl    %esp, %ebp     pushl   %edi     pushl   %esi     pushl   %ebx     subl    $124, %esp     call    ___i686.get_pc_thunk.bx "L00000000001$pb":     leal    -104(%ebp), %eax     movl    %eax, -108(%ebp)     movl    $80, 8(%esp)     movl    $0, 4(%esp)     movl    %eax, (%esp)     call    L_memset$stub     xorl    %esi, %esi     leal    LC0-"L00000000001$pb"(%ebx), %edi L2:     movl    -108(%ebp), %edx     movl    (%edx,%esi,4), %eax     movl    %eax, 8(%esp)     movl    %esi, 4(%esp)     movl    %edi, (%esp)     call    L_printf$stub     addl    $1, %esi     cmpl    $20, %esi     jne L2     addl    $124, %esp     popl    %ebx     popl    %esi     popl    %edi     popl    %ebp     ret

Well, it's different, that's for sure. The 104 and 108 number difference comes of the variable b (in the first code there was one variable less on stack, now we have one more, changing stack addresses). The real code difference in the for loop is

movl    -104(%ebp,%esi,4), %eax

compared to

movl    -108(%ebp), %edx movl    (%edx,%esi,4), %eax

Actually to me it rather looks like the first approach is faster(!), since it issues one CPU machine code to perform all the work (the CPU does it all for us), instead of having two machine codes. On the other hand, the two assembly commands below might have a lower runtime altogether than the one above.

As a closing word, I'd say depending on your compiler and the CPU capabilities (what commands CPUs offer to access memory in what way), the result might be either way. Either one might be faster/slower. You cannot say for sure unless you limit yourself exactly to one compiler (meaning also one version) and one specific CPU. As CPUs can do more and more in a single assembly command (ages ago, a compiler really had to manually fetch the address, multiply i by four and add both together before fetching the value), statements that used to be an absolute truth ages ago are nowadays more and more questionable. Also who knows how CPUs work internally? Above I compare one assembly instructions to two other ones.

I can see that the number of instructions is different and the time such an instruction needs can be different as well. Also how much memory these instructions needs in their machine presentation (they need to be transferred from memory to CPU cache after all) is different. However modern CPUs don't execute instructions the way you feed them. They split big instructions (often referred to as CISC) into small sub-instructions (often referred to as RISC), which also allows them to better optimize program flow for speed internally. In fact, the first, single instruction and the two other instructions below might result in the same set of sub-instructions, in which case there is no measurable speed difference whatsoever.

Regarding Objective-C, it is just C with extensions. So everything that holds true for C will hold true for Objective-C as well in terms of pointers and arrays. If you use Objects on the other hand (for example, an NSArray or NSMutableArray), this is a completely different beast. However in that case you must access these arrays with methods anyway, there is no pointer/array access to choose from.

answered Oct 05 '22 22:10

Mecki

Related questions
                            
                                Xcode "Missing Submodule" warning
                            
                                What is the difference between UIApplication.sharedApplication.delegate.window and UIApplication.sharedApplication.keyWindow?
                            
                                Any ReSharper equivalent for Xcode?
                            
                                Cookie sharing between multiple WKWebViews
                            
                                Adding unknown number of rows to 'Static Cells' UITableView
                            
                                Xcode Preprocessor Output
                            
                                Best way to make NSRunLoop wait for a flag to be set?
                            
                                How do I flag a function as being deprecated in an iOS Objective-C header file?
                            
                                looping through enum values
                            
                                Is it possible to suppress Xcode 4 static analyzer warnings?
                            
                                Proper way to instantiate an NSDecimalNumber from float or double
                            
                                Custom colors in UITabBar
                            
                                How to print NSMutableURLRequest?
                            
                                Do I need use dealloc method with ARC?
                            
                                XCTAssertEqual not working for double values
                            
                                UITableView/UITableViewCell tap event response?
                            
                                Difference between self.ivar and ivar?
                            
                                Camera with Custom View
                            
                                Method with multiple input parameters
                            
                                Return value for performSelector:

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Accessing array values via pointer arithmetic vs. subscripting in C

Tags:

arrays

c

pointers

objective-c

pointer-arithmetic

John Rudy

People also ask

1 Answers

Mecki

Recent Activity

Donate For Us