Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed of indexing Perl arrays by offset

According to this question and this answer, lists are implemented as arrays:

Perl implements lists with an array and first/last element offsets. The array is allocated larger than needed with the offsets originally pointing in the middle of the array so there is room to grow in both directions (unshifts and pushes/inserts) before a re-allocation of the underlying array is necessary. The consequence of this implementation is that all of perl's primitive list operators (insertion, fetching, determining array size, push, pop, shift, unshift, etc.) perform in O(1) time.

So you would expect accessing an element by a numeric offset would be just as fast because they're arrays in the implementation, which provide very fast constant-time indexing. However, in a footnote in Learning Perl, the author says

Indexing into arrays is not using Perl’s strengths. If you use the pop, push, and similar operators that avoid using indexing, your code will generally be faster than if you use many indices, as well as avoiding “off-by-one” errors, often called “fencepost” errors. Occasionally, a beginning Perl programmer (wanting to see how Perl’s speed compares to C’s) will take, say, a sorting algorithm optimized for C (with many array index operations), rewrite it straightforwardly in Perl (again, with many index operations) and wonder why it’s so slow. The answer is that using a Stradivarius violin to pound nails should not be considered a sound construction technique.

How can this be true when a list is really an array under the hood? I know it's simply ignorant to try to compare the speed of Perl to C, but wouldn't indexing a list by offset be just as fast as pop or push or whatever? These seem to contradict each other.

like image 783
Seth Carnegie Avatar asked Jun 09 '11 03:06

Seth Carnegie


1 Answers

It's to do with the implementation of Perl as a series of opcodes. push, pop, shift and unshift are all opcodes themselves, so they can index into the array they're manipulating from C, where the accesses are very fast. If you do this from Perl with indices you'll make Perl perform extra opcodes to get the index from the scalar, get the slot from the array, then put something into it.

You can see this by using the -MO=Terse switch to see what Perl is really (in some sense) doing:

$foo[$i] = 1

    BINOP (0x18beae0) sassign
        SVOP (0x18bd850) const  IV (0x18b60b0) 1
        BINOP (0x18beb60) aelem
            UNOP (0x18bedb0) rv2av
                SVOP (0x18bef30) gv  GV (0x18b60c8) *foo
            UNOP (0x18beba0) null [15]
                SVOP (0x18bec70) gvsv  GV (0x18b60f8) *i

push @foo, 1

    LISTOP (0x18bd7b0) push [2]
        OP (0x18aff70) pushmark
        UNOP (0x18beb20) rv2av [1]
            SVOP (0x18bd8f0) gv  GV (0x18b60c8) *foo
        SVOP (0x18bed10) const  IV (0x18b61b8) 1

You see that Perl has to perform fewer steps, so can be expected to be faster.

The trick with any interpreted language is to let it do all the work.

like image 111
Alex Avatar answered Oct 16 '22 17:10

Alex