Is there a performance penalty (however small) for Julia using one-based array indexing since machine code usually more directly supports zero-based indexing?
Zero-based array indexing is a way of numbering the items in an array such that the first item of it has an index of 0, whereas a one-based array indexed array has its first item indexed as 1. Zero-based indexing is a very common way to number items in a sequence in today's modern mathematical notation.
In computer science, array indices usually start at 0 in modern programming languages, so computer programmers might use zeroth in situations where others might use first, and so forth.
Indexing is an operation that pulls out a select set of values from an array. The index of a value in an array is that value's location within the array. There is a difference between the value and where the value is stored in an array.
question. The index of an array always starts from zero. The first index does not start with one.
I did some snooping around and here is what I fond (I used Julia 0.6 for all experiments below):
> arr = zeros(5);
> @code_llvm arr[1]
define double @jlsys_getindex_51990(i8** dereferenceable(40), i64) #0 !dbg !5 {
top:
%2 = add i64 %1, -1
%3 = bitcast i8** %0 to double**
%4 = load double*, double** %3, align 8
%5 = getelementptr double, double* %4, i64 %2
%6 = load double, double* %5, align 8
ret double %6
}
In this snippet %1
holds the actual index. Note the %2 = add i64 %1, -1
. Julia does indeed uses 0-based arrays under the hood and subtracts 1 off the index. This results in an additional llvm instruction being generated, so the llvm code looks slightly less efficient. However how this additional arithmetic operation trickles down to native code is another question.
> @code_native arr[1]
.text
Filename: array.jl
Source line: 520
leaq -1(%rsi), %rax
cmpq 24(%rdi), %rax
jae L20
movq (%rdi), %rax
movsd -8(%rax,%rsi,8), %xmm0 # xmm0 = mem[0],zero
retq
L20:
pushq %rbp
movq %rsp, %rbp
movq %rsp, %rcx
leaq -16(%rcx), %rax
movq %rax, %rsp
movq %rsi, -16(%rcx)
movl $1, %edx
movq %rax, %rsi
callq 0xffffffffffcbf392
nopw %cs:(%rax,%rax)
The good news on these architectures is that they support arbitrary-number-based indexing. The movsd -8(%rax,%rsi,8), %xmm0
and leaq -1(%rsi), %rax
are the two instructions affected by the 1-based indexing in Julia. Look at the movsd
instruction, in this one single instruction we do both the actual indexing and the subtracting. The -8
part is the subtracting. If 0-based indexing was used than the instruction would be movsd (%rax,%rsi,8), %xmm0
.
The other affected instruction is leaq -1(%rsi), %rax
. However due to the fact that cmp
instructions use an in-out argument, the value of %rsi
has to be copied to another register so under 0-based indexing the same instruction would still be generated but it would probably look like leaq (%rsi), %rax
.
So on x86 and amd64 machines the 1-based indexing results in simply using slightly more complicated version of the same instructions but no additional instructions are generated. The code most probably runs exactly as fast as 0-based indexing. If any slowdown is present it is probably due to the specific micro architecture and would be present in one CPU model and not present in another. This difference is down to the silicon and I wouldn't worry about it.
Unfortunately, I don't know enough about arm
and other architectures but the situation is probably similar.
When interfacing with another language like C or Python, one always has to remember to subtract or add 1 when passing indices around. The compiler cannot help you because the other code is out of its reach. So there is a performance hit of 1 extract arithmetic operations in this case. But unless this is in a really tight loop, this difference is negligible.
Well, the elephant in the room is the bound checking. Returning to the previous assembly snippet, most of the generated code is concerned with that - the first 3 instructions and everything under the L20
label. The actual indexing is just the movq
and movsd
instructions. So if you care about really fast code then you will get much more of a performance penalty from the bound checking than the 1-based indexing. Fortunately Julia offers ways to alleviate this problems through the use of @inbound and --check-bounds=no
.
The most likely possibility is that Julia simply subtracts 1 from the indexes you provide it, and uses zero-based arrays under the hood. So the performance penalty would be the cost of the subtraction (almost certainly immaterial).
It would be easy enough to write two small bits of code to test the performance of each.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With