Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

perl memory bloat in hash even when doing delete?

I've got a long perl script that caches some information from a file in a hash, and every once in a while (here, every 100000 positions), it prints the values of the hash for that window, then attempts to delete most of the contents from the hash, except for a small buffer to be used in the next iteration.

I say it attempts to delete the contents, because my script blows up in memory usage, until it uses all memory and crashes. Even though it seems the delete statement is reducing the number of keys in the hash (see print STDERR below) to only a small number of elements, the memory consumption of the script skyrockets as if it is not deleting the contents. If I comment out the delete statement, it uses the same amount of memory, with the only difference that it takes longer to iterate. It seems like the number of keys is reduced after the delete command, but not the number of values.

I made sure there is no weird buffering with the reading and outputting of results. In fact, the script doesnt run out of memory if I just comment out the places where %hash is used, so I narrowed it down to the filling up and deleting of entries in %hash.

I also tried to use a hashref instead of %hash, and the same is still happening.

How come it's blowing up in memory? Am I missing anything obvious here?

my %hash;
# while ( Read from input ) {
# Fill hash here and there with: $hash{$this_p}{$this_c}++
# ...
# Then every 100000 entries
 if ( not $pos % 100000 ) {
    print STDERR "pre ", scalar %hash , "\n";
warn total_size(\%hash);
    for my $p ( sort { $a <=> $b } keys %hash ) {
        last if ( $p > $max_possible{$subset} );
        if ( $p + $buffer < $pos ) {
            print $out "$p\t";
            for my $c ( keys %{ $hash{$p} } ) {
                print $out "$c ". $hash{$p}{$c} . ";";
            }
            print $out "\n";
            delete $hash{$p};
        }
    }
    print STDERR "post ", scalar %hash , "\n";
warn total_size(\%hash);
  }
#}

Output is something like this:

pre 322484/524288
134297952 at /home/
post 681/524288
4368924 at /home/av
pre 681/524288
4368924 at /home/av
post 681/524288
4368924 at /home/av
pre 681/524288
4368924 at /home/av
post 681/524288
4368924 at /home/av
pre 629257/1048576
260016542 at /home/
post 344/1048576
8477509 at /home/av
pre 1903885/4194304
689633878 at /home/
post 900/4194304
33790436 at /home/a
[...]

This is using perl v5.14.2 on a 64bit Linux box.

like image 715
719016 Avatar asked Aug 13 '13 13:08

719016


1 Answers

The number of elements you place in the hash in each pass is growing as your program runs. 0+keys(%hash) would tell you the exact number, but the numerator in the following will be similar (but lower)

                      322484 added
pre  322484/524288
                      321803 cleared (99.8% of added)
post    681/524288
                           0 added
pre     681/524288
                           0 cleared (100.0% of added)
post    681/524288
                           0 added
pre     681/524288
                           0 cleared (100.0% of added)
post    681/524288
                      628576 added
pre  629257/1048576
                      628913 cleared (100.0% of added)
post    344/1048576
                     1903541 added
pre 1903885/4194304
                     1902641 cleared (100.0% of added)
post    900/4194304

The denominator is only growing because the numerator is growing. It's not relevant. It's not cumulative growth. It would get that big even if you had a fresh hash every time.

The numerator is only growing because the number of elements you add to the hash grows. As you can see, the clearing code works extremely well.

This doesn't look like a memory leak at all; it looks like you're actually using the memory. Maybe you should clear it more often?

Instead of

if (not $pos % 100000) {
    ...
}

use

if (keys(%hash) >= 1_000_000) {
    ...
}

Or if you want regular feedback,

if (++$since_last >= 100_000 || keys(%hash) >= 1_000_000) {
    $since_last = 0;
    ...
}

Adjust the limits as required.

like image 175
ikegami Avatar answered Oct 09 '22 15:10

ikegami