Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a performance penalty for one-based array indexing?

Tags:

julia

Is there a performance penalty (however small) for Julia using one-based array indexing since machine code usually more directly supports zero-based indexing?

like image 323
Georges St. Clair Avatar asked Jan 19 '15 20:01

Georges St. Clair


People also ask

What is a 1 indexed array?

Zero-based array indexing is a way of numbering the items in an array such that the first item of it has an index of 0, whereas a one-based array indexed array has its first item indexed as 1. Zero-based indexing is a very common way to number items in a sequence in today's modern mathematical notation.

Should array indices start at 0 or 1?

In computer science, array indices usually start at 0 in modern programming languages, so computer programmers might use zeroth in situations where others might use first, and so forth.

How does indexing an array work?

Indexing is an operation that pulls out a select set of values from an array. The index of a value in an array is that value's location within the array. There is a difference between the value and where the value is stored in an array.

Are index starts from 1 True or false?

question. The index of an array always starts from zero. The first index does not start with one.


2 Answers

I did some snooping around and here is what I fond (I used Julia 0.6 for all experiments below):

> arr = zeros(5);
> @code_llvm arr[1]

define double @jlsys_getindex_51990(i8** dereferenceable(40), i64) #0 !dbg !5 {
top:
  %2 = add i64 %1, -1
  %3 = bitcast i8** %0 to double**
  %4 = load double*, double** %3, align 8
  %5 = getelementptr double, double* %4, i64 %2
  %6 = load double, double* %5, align 8
  ret double %6
}

In this snippet %1 holds the actual index. Note the %2 = add i64 %1, -1. Julia does indeed uses 0-based arrays under the hood and subtracts 1 off the index. This results in an additional llvm instruction being generated, so the llvm code looks slightly less efficient. However how this additional arithmetic operation trickles down to native code is another question.

On ax86 and amd64

> @code_native arr[1]
        .text
Filename: array.jl
Source line: 520
    leaq    -1(%rsi), %rax
    cmpq    24(%rdi), %rax
    jae L20
    movq    (%rdi), %rax
    movsd   -8(%rax,%rsi,8), %xmm0  # xmm0 = mem[0],zero
    retq
L20:
    pushq   %rbp
    movq    %rsp, %rbp
    movq    %rsp, %rcx
    leaq    -16(%rcx), %rax
    movq    %rax, %rsp
    movq    %rsi, -16(%rcx)
    movl    $1, %edx
    movq    %rax, %rsi
    callq   0xffffffffffcbf392
    nopw    %cs:(%rax,%rax)

The good news on these architectures is that they support arbitrary-number-based indexing. The movsd -8(%rax,%rsi,8), %xmm0 and leaq -1(%rsi), %rax are the two instructions affected by the 1-based indexing in Julia. Look at the movsd instruction, in this one single instruction we do both the actual indexing and the subtracting. The -8 part is the subtracting. If 0-based indexing was used than the instruction would be movsd (%rax,%rsi,8), %xmm0.

The other affected instruction is leaq -1(%rsi), %rax. However due to the fact that cmp instructions use an in-out argument, the value of %rsi has to be copied to another register so under 0-based indexing the same instruction would still be generated but it would probably look like leaq (%rsi), %rax.

So on x86 and amd64 machines the 1-based indexing results in simply using slightly more complicated version of the same instructions but no additional instructions are generated. The code most probably runs exactly as fast as 0-based indexing. If any slowdown is present it is probably due to the specific micro architecture and would be present in one CPU model and not present in another. This difference is down to the silicon and I wouldn't worry about it.

Unfortunately, I don't know enough about arm and other architectures but the situation is probably similar.

Interfacing with another language

When interfacing with another language like C or Python, one always has to remember to subtract or add 1 when passing indices around. The compiler cannot help you because the other code is out of its reach. So there is a performance hit of 1 extract arithmetic operations in this case. But unless this is in a really tight loop, this difference is negligible.

The elephant in the room

Well, the elephant in the room is the bound checking. Returning to the previous assembly snippet, most of the generated code is concerned with that - the first 3 instructions and everything under the L20 label. The actual indexing is just the movq and movsd instructions. So if you care about really fast code then you will get much more of a performance penalty from the bound checking than the 1-based indexing. Fortunately Julia offers ways to alleviate this problems through the use of @inbound and --check-bounds=no.

like image 77
Svetlin Mladenov Avatar answered Oct 09 '22 19:10

Svetlin Mladenov


The most likely possibility is that Julia simply subtracts 1 from the indexes you provide it, and uses zero-based arrays under the hood. So the performance penalty would be the cost of the subtraction (almost certainly immaterial).

It would be easy enough to write two small bits of code to test the performance of each.

like image 34
Robert Harvey Avatar answered Oct 09 '22 19:10

Robert Harvey