Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl: Performance of array-insert using 'splice()' VS linked-list

I have a script in which I use Perl arrays. Each array contains hundreds of thousands of items.

I frequently need to dynamically add items in the middle of an array, or to delete items from it.

I want to understand whether I should use linked-lists instead of the Perl arrays, as I make frequent insertions and deletions

So my questions are:

  • How is splice() implemented?
  • What is the complexity of splice(), when is used for inserting item x into index i in a Perl array
  • Can you recommend on a Perl linked-list module that you've worked with?

Thanks!

like image 733
SomethingSomething Avatar asked May 28 '26 21:05

SomethingSomething


2 Answers

Perl arrays are stored as an array of pointers, a beginning offset, a length, and an allocated length.

So inserting or deleting from the middle will require moving 4 or 8 bytes times the number of later elements in the array. Deleting from either end won't require moving anything, just adjusting the beginning offset or length. Inserting at the end will usually just require adjusting the length, but occasionally require reallocating the entire array of pointers. Inserting at the beginning, perl will do its best to arrange so that just the beginning offset will need to be adjusted, but sometimes the entire array will need to be moved or even reallocated.

In practice, the overhead of creating and managing a linked list using perl operations is going to be much greater in almost every case than just using an array.

To benchmark it, we would need to know a lot more about your particular case; what actual size of array, what kind and size of elements (not relevant to the cost of splice, but perhaps relevant to a linked list), relative frequency of inserts/deletes, etc.

like image 112
ysth Avatar answered Jun 02 '26 20:06

ysth


Did a quick splicing benchmark and it seems to behave as O(N) for both removals and insertions.

Script:

my $length = shift;
my $toSplice = 100;

my @list = (1 .. $length);

my $t0 = Time::HiRes::time();
for(1 .. $toSplice) {
    my $removeIdx = int(rand() * @list);
    splice @list, $removeIdx, 1;
}

my $t1 = Time::HiRes::time();
for(1 .. $toSplice) {
    my $insertIdx = int(rand() * @list);
    splice @list, $insertIdx, 0, 0;
}

printf("Took %.4fs to remove\n", $t1 - $t0);
printf("Took %.4fs to insert\n", Time::HiRes::time() - $t0);

Results:

$ perl test.pl 100000
Took 0.0026s to remove
Took 0.0092s to insert
$ perl test.pl 1000000
Took 0.0296s to remove
Took 0.0617s to insert
$ perl test.pl 10000000
Took 0.2876s to remove
Took 0.6252s to insert

So increasing the number of iterations by 10x increased the run time by roughly 10x.

like image 27
AKHolland Avatar answered Jun 02 '26 20:06

AKHolland



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!