Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does a hash shrink in Perl as you delete elements?

Does a hash shrink in Perl as you delete elements.

More specifically I had a perl program that I inherited that would parse a huge file ( 1 GB ) and load up a hash of hashes. it would do that same for another file and then do a comparison of different elements. The memory consumption was huge during this process and even though I added deleting hash elements has they were used the memory consumption seemed to be unaffected.

The script was extremely slow and such a memory hog. I know it was not well designed but any ideas about the hash memory usage?

like image 978
Brian G Avatar asked Sep 24 '08 23:09

Brian G


4 Answers

You might want to check out something like DBM::Deep. It does that tie-ing stuff that Michael mentioned so you don't have to think about it. Everything is stored on disk instead of in memory. It's just short of needing a fancier database server.

Also, if you want to track down the performance bottleneck, check out Devel::NYTProf, the new hotness in Perl profiling that came out of the New York Times.

like image 89
brian d foy Avatar answered Sep 18 '22 12:09

brian d foy


In general, Perl cannot return memory to the operating system. It may be able to reuse memory internally, though, which could reduce the amount of memory needed by a program.

See perlfaq3: How can I free an array or hash so my program shrinks?

If the memory used by the hashes is excessive (i.e. > physical memory) you could tie them to a file on disk. This would greatly reduce your memory usage but be warned that accessing a structure on disk is much slower than accessing one in memory. (So is disk thrashing.)

like image 34
Michael Carman Avatar answered Sep 17 '22 12:09

Michael Carman


If your hash is truly gigantic, a better strategy is to probably use an on-disk hash and let the OS worry about getting things into and out of memory. I'm particularly fond of Berkeley DB for storing big hashes on disk, and the Perl BerkeleyDB module provides a full-featured interface, including a tied API.

DBM::Deep can also be used as a drop-in hash replacement, but relies on its own format. This can be a pain if your structure needs to be read by other (non-Perl) systems.

like image 34
friedo Avatar answered Sep 19 '22 12:09

friedo


Regarding the specific question: No, deleting hash keys does not reduce your program's memory consumption.

Regarding the more general case: The substantial majority of programs and languages will continue to hold on to memory which they have previously used, but are not currently using. This is because requesting the allocation of memory by the operating system is a relatively slow operation, so they keep it in case it's needed again later.

So, if you want to improve on this situation, you'll need to reduce the peak amount of memory required by your program, whether by revising your algorithms to not need access to as much data at once, by using on-disk storage (such as the aforementioned DBM::Deep), or by releasing space from unneeded variables back to perl (let them go out of scope or set them to undef) so that it can be reused.

like image 45
Dave Sherohman Avatar answered Sep 17 '22 12:09

Dave Sherohman