<p>Is there a performance penalty (however small) for Julia using one-based array indexing since machine code usually more directly supports zero-based indexing?</p>

<p>I did some snooping around and here is what I fond (I used Julia 0.6 for all experiments below):</p> <pre class="prettyprint"><code>> arr = zeros(5); > @code_llvm arr[1] define double @jlsys_getindex_51990(i8** dereferenceable(40), i64) #0 !dbg !5 { top: %2 = add i64 %1, -1 %3 = bitcast i8** %0 to double** %4 = load double*, double** %3, align 8 %5 = getelementptr double, double* %4, i64 %2 %6 = load double, double* %5, align 8 ret double %6 } </code></pre> <p>In this snippet <code>%1</code> holds the actual index. Note the <code>%2 = add i64 %1, -1</code>. Julia does indeed uses 0-based arrays under the hood and subtracts 1 off the index. This results in an additional llvm instruction being generated, so the llvm code looks slightly less efficient. However how this additional arithmetic operation trickles down to native code is another question.</p> <h3>On ax86 and amd64</h3> <pre class="prettyprint"><code>> @code_native arr[1] .text Filename: array.jl Source line: 520 leaq -1(%rsi), %rax cmpq 24(%rdi), %rax jae L20 movq (%rdi), %rax movsd -8(%rax,%rsi,8), %xmm0 # xmm0 = mem[0],zero retq L20: pushq %rbp movq %rsp, %rbp movq %rsp, %rcx leaq -16(%rcx), %rax movq %rax, %rsp movq %rsi, -16(%rcx) movl $1, %edx movq %rax, %rsi callq 0xffffffffffcbf392 nopw %cs:(%rax,%rax) </code></pre> <p>The good news on these architectures is that they support arbitrary-number-based indexing. The <code>movsd -8(%rax,%rsi,8), %xmm0</code> and <code>leaq -1(%rsi), %rax</code> are the two instructions affected by the 1-based indexing in Julia. Look at the <code>movsd</code> instruction, in this one single instruction we do both the actual indexing and the subtracting. The <code>-8</code> part is the subtracting. If 0-based indexing was used than the instruction would be <code>movsd (%rax,%rsi,8), %xmm0</code>.</p> <p>The other affected instruction is <code>leaq -1(%rsi), %rax</code>. However due to the fact that <code>cmp</code> instructions use an in-out argument, the value of <code>%rsi</code> has to be copied to another register so under 0-based indexing the same instruction would still be generated but it would probably look like <code>leaq (%rsi), %rax</code>.</p> <p>So on x86 and amd64 machines the 1-based indexing results in simply using slightly more complicated version of the same instructions but no additional instructions are generated. The code most probably runs exactly as fast as 0-based indexing. If any slowdown is present it is probably due to the specific micro architecture and would be present in one CPU model and not present in another. This difference is down to the silicon and I wouldn't worry about it.</p> <p>Unfortunately, I don't know enough about <code>arm</code> and other architectures but the situation is probably similar.</p> <h3>Interfacing with another language</h3> <p>When interfacing with another language like C or Python, one always has to remember to subtract or add 1 when passing indices around. The compiler cannot help you because the other code is out of its reach. So there is a performance hit of 1 extract arithmetic operations in this case. But unless this is in a really tight loop, this difference is negligible.</p> <h3>The elephant in the room</h3> <p>Well, the elephant in the room is the bound checking. Returning to the previous assembly snippet, most of the generated code is concerned with that - the first 3 instructions and everything under the <code>L20</code> label. The actual indexing is just the <code>movq</code> and <code>movsd</code> instructions. So if you care about really fast code then you will get much more of a performance penalty from the bound checking than the 1-based indexing. Fortunately Julia offers ways to alleviate this problems through the use of @inbound and <code>--check-bounds=no</code>.</p>

Is there a performance penalty for one-based array indexing?

2 Answers

I did some snooping around and here is what I fond (I used Julia 0.6 for all experiments below):

> arr = zeros(5);
> @code_llvm arr[1]

define double @jlsys_getindex_51990(i8** dereferenceable(40), i64) #0 !dbg !5 {
top:
  %2 = add i64 %1, -1
  %3 = bitcast i8** %0 to double**
  %4 = load double*, double** %3, align 8
  %5 = getelementptr double, double* %4, i64 %2
  %6 = load double, double* %5, align 8
  ret double %6
}

In this snippet %1 holds the actual index. Note the %2 = add i64 %1, -1. Julia does indeed uses 0-based arrays under the hood and subtracts 1 off the index. This results in an additional llvm instruction being generated, so the llvm code looks slightly less efficient. However how this additional arithmetic operation trickles down to native code is another question.

On ax86 and amd64

> @code_native arr[1]
        .text
Filename: array.jl
Source line: 520
    leaq    -1(%rsi), %rax
    cmpq    24(%rdi), %rax
    jae L20
    movq    (%rdi), %rax
    movsd   -8(%rax,%rsi,8), %xmm0  # xmm0 = mem[0],zero
    retq
L20:
    pushq   %rbp
    movq    %rsp, %rbp
    movq    %rsp, %rcx
    leaq    -16(%rcx), %rax
    movq    %rax, %rsp
    movq    %rsi, -16(%rcx)
    movl    $1, %edx
    movq    %rax, %rsi
    callq   0xffffffffffcbf392
    nopw    %cs:(%rax,%rax)

The good news on these architectures is that they support arbitrary-number-based indexing. The movsd -8(%rax,%rsi,8), %xmm0 and leaq -1(%rsi), %rax are the two instructions affected by the 1-based indexing in Julia. Look at the movsd instruction, in this one single instruction we do both the actual indexing and the subtracting. The -8 part is the subtracting. If 0-based indexing was used than the instruction would be movsd (%rax,%rsi,8), %xmm0.

The other affected instruction is leaq -1(%rsi), %rax. However due to the fact that cmp instructions use an in-out argument, the value of %rsi has to be copied to another register so under 0-based indexing the same instruction would still be generated but it would probably look like leaq (%rsi), %rax.

So on x86 and amd64 machines the 1-based indexing results in simply using slightly more complicated version of the same instructions but no additional instructions are generated. The code most probably runs exactly as fast as 0-based indexing. If any slowdown is present it is probably due to the specific micro architecture and would be present in one CPU model and not present in another. This difference is down to the silicon and I wouldn't worry about it.

Unfortunately, I don't know enough about arm and other architectures but the situation is probably similar.

Interfacing with another language

When interfacing with another language like C or Python, one always has to remember to subtract or add 1 when passing indices around. The compiler cannot help you because the other code is out of its reach. So there is a performance hit of 1 extract arithmetic operations in this case. But unless this is in a really tight loop, this difference is negligible.

The elephant in the room

Well, the elephant in the room is the bound checking. Returning to the previous assembly snippet, most of the generated code is concerned with that - the first 3 instructions and everything under the L20 label. The actual indexing is just the movq and movsd instructions. So if you care about really fast code then you will get much more of a performance penalty from the bound checking than the 1-based indexing. Fortunately Julia offers ways to alleviate this problems through the use of @inbound and --check-bounds=no.

answered Oct 09 '22 19:10

Svetlin Mladenov

The most likely possibility is that Julia simply subtracts 1 from the indexes you provide it, and uses zero-based arrays under the hood. So the performance penalty would be the cost of the subtraction (almost certainly immaterial).

It would be easy enough to write two small bits of code to test the performance of each.

answered Oct 09 '22 19:10

Robert Harvey

Related questions
                            
                                on the impulse to vectorise all the things
                            
                                Efficient implementation of Markov Chains in julia
                            
                                Iterators product on array of arrays
                            
                                What's the Python equivalent of Julia's `@edit` macro?
                            
                                Joining regular expressions in julia
                            
                                Julia version in Julia Studio
                            
                                Julia - C struct inside struct stored as pointer in Julia
                            
                                Simplest way(s) to make a Julia package available to others
                            
                                Julia: use of pmap with Arrays vs SharedArrays
                            
                                Marginalise over n dimensional array
                            
                                Clarification on function signature and dispatching behaviour in julia
                            
                                How to tell what specializations are compiled for a method?
                            
                                Effective simulation of large scale Modelica models by automatic translation to Modia [closed]
                            
                                Julia: why must parametric types have outer constructors?
                            
                                How to execute a Julia script step by step?
                            
                                What is the correct way to save and retrieve dictionaries in Julia?
                            
                                Creating an analogue of Haskell's Either type in Julia
                            
                                How does a non-standard string literal avoid a syntax error generated by a standard string literal?
                            
                                Julia function returning anonymous function
                            
                                Uninitialized arrays in Julia

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a performance penalty for one-based array indexing?

Tags:

julia

Georges St. Clair

People also ask