Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

perl - array of integers using way too much memory?

Tags:

types

perl

When I run the following script:

my @arr = [1..5000000];

for($i=0; $i<5000000; $i++) {
        $arr[$i] = $i;
        if($i % 1000000 == 0) {
                print "$i\n";
        }
}

It consumes about 500 MB memory. Used to higher-level compiled languages I would expect it to be roughly 5M * 4B = 20MB (4 bytes per number).

I guess that is because each value is a scalar, not a simple binary number. Is it possible to decrease memory footprint by treating those values as numbers, or is 500 MB for this task the only way?

like image 980
Konrad Garus Avatar asked Jun 22 '11 13:06

Konrad Garus


4 Answers

If you are dealing with such large arrays, you might want to use a toolkit like the PDL.

(Oh, and yes, you are correct: It takes so much memory because it is an array of Perl scalars.)

like image 70
Nemo Avatar answered Oct 22 '22 16:10

Nemo


All Perl values are represented internally as perl scalars, which consume way more memory than a simple int. Even if the scalar is only holding an int. Even if the scalar is undef!

As others have suggested, PDL may be something to look at if you really want to work with huge arrays of this sort.

like image 36
JSBձոգչ Avatar answered Oct 22 '22 14:10

JSBձոգչ


You can always use C or C++ in Perl.This will probably give you a small footprint in some hard jobs. Just an idea using C!

#!/usr/bin/perl
use Inline C;
use strict;

for(my $i=0; $i<5000000; $i++) {
        set_array_index($i,$i);
        if($i % 1000000 == 0) {
                #print "$i\n";
                print get_array_index($i)."\n";
        }
}

__END__
__C__

int array[5000000];

void set_array_index(int index,int value) {
    array[index]=value;
}

int get_array_index(int index) {

    if (array[index]==NULL)
        return 0;

    return array[index];
}
like image 3
cirne100 Avatar answered Oct 22 '22 16:10

cirne100


Complete revision of my answer. Looking at what you have in your code, I see some strange things.

my @arr = [1..5000000];

Here, you assign an anonymous array-reference to $arr[0]. This array only holds one value: The array reference. The hidden anonymous array holds the 5 million numbers.

for($i=0; $i<5000000; $i++) {
        $arr[$i] = $i;
        if($i % 1000000 == 0) {
                print "$i\n";
        }
}

Here, you fill the array with 5 million sequential numbers, overwriting the array reference in the declaration.

A much shorter way to do it would be:

my @arr = (1 .. 5_000_000);

Perhaps that will save you some memory.

like image 2
TLP Avatar answered Oct 22 '22 15:10

TLP