I've got a long perl script that caches some information from a file in a hash, and every once in a while (here, every 100000 positions), it prints the values of the hash for that window, then attempts to delete most of the contents from the hash, except for a small buffer to be used in the next iteration.
I say it attempts to delete the contents, because my script blows up in memory usage, until it uses all memory and crashes. Even though it seems the delete statement is reducing the number of keys in the hash (see print STDERR below) to only a small number of elements, the memory consumption of the script skyrockets as if it is not deleting the contents. If I comment out the delete statement, it uses the same amount of memory, with the only difference that it takes longer to iterate. It seems like the number of keys is reduced after the delete command, but not the number of values.
I made sure there is no weird buffering with the reading and outputting of results. In fact, the script doesnt run out of memory if I just comment out the places where %hash is used, so I narrowed it down to the filling up and deleting of entries in %hash.
I also tried to use a hashref instead of %hash, and the same is still happening.
How come it's blowing up in memory? Am I missing anything obvious here?
my %hash;
# while ( Read from input ) {
# Fill hash here and there with: $hash{$this_p}{$this_c}++
# ...
# Then every 100000 entries
if ( not $pos % 100000 ) {
print STDERR "pre ", scalar %hash , "\n";
warn total_size(\%hash);
for my $p ( sort { $a <=> $b } keys %hash ) {
last if ( $p > $max_possible{$subset} );
if ( $p + $buffer < $pos ) {
print $out "$p\t";
for my $c ( keys %{ $hash{$p} } ) {
print $out "$c ". $hash{$p}{$c} . ";";
}
print $out "\n";
delete $hash{$p};
}
}
print STDERR "post ", scalar %hash , "\n";
warn total_size(\%hash);
}
#}
Output is something like this:
pre 322484/524288
134297952 at /home/
post 681/524288
4368924 at /home/av
pre 681/524288
4368924 at /home/av
post 681/524288
4368924 at /home/av
pre 681/524288
4368924 at /home/av
post 681/524288
4368924 at /home/av
pre 629257/1048576
260016542 at /home/
post 344/1048576
8477509 at /home/av
pre 1903885/4194304
689633878 at /home/
post 900/4194304
33790436 at /home/a
[...]
This is using perl v5.14.2 on a 64bit Linux box.
The number of elements you place in the hash in each pass is growing as your program runs. 0+keys(%hash)
would tell you the exact number, but the numerator in the following will be similar (but lower)
322484 added
pre 322484/524288
321803 cleared (99.8% of added)
post 681/524288
0 added
pre 681/524288
0 cleared (100.0% of added)
post 681/524288
0 added
pre 681/524288
0 cleared (100.0% of added)
post 681/524288
628576 added
pre 629257/1048576
628913 cleared (100.0% of added)
post 344/1048576
1903541 added
pre 1903885/4194304
1902641 cleared (100.0% of added)
post 900/4194304
The denominator is only growing because the numerator is growing. It's not relevant. It's not cumulative growth. It would get that big even if you had a fresh hash every time.
The numerator is only growing because the number of elements you add to the hash grows. As you can see, the clearing code works extremely well.
This doesn't look like a memory leak at all; it looks like you're actually using the memory. Maybe you should clear it more often?
Instead of
if (not $pos % 100000) {
...
}
use
if (keys(%hash) >= 1_000_000) {
...
}
Or if you want regular feedback,
if (++$since_last >= 100_000 || keys(%hash) >= 1_000_000) {
$since_last = 0;
...
}
Adjust the limits as required.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With