Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my Perl regex using so much memory?

I'm running a regular expression against a large scalar. Though this match isn't capturing anything, my process grows by 30M after this match:

# A
if (${$c} =~ m/\G<<\s*/cgs)
{
    #B
    ...
}

$c is a reference to a pretty big scalar (around 21M), but I've verified that pos(${$c}) is in the right place and the expression matches at the first character, with pos(${$c}) being updated to the correct place after the match. But as I mentioned, the process has grown by about 30M between #A and #B, even though I'm not capturing anything with this match. Where is my memory going?

Edit: Yes, use of $& was to blame. We are using Perl 5.8.8, and my script was using Getopt::Declare, which uses the built-in Text::Balanced. The 1.95 version of this module was using $&. The 2.0.0 version that ships with Perl 5.10 has removed the reference to $& and alleviates the problem.

like image 797
Ryan Olson Avatar asked Oct 03 '08 04:10

Ryan Olson


1 Answers

Just a quick sanity check, are you mentioning $&, $` or $' (sometimes called $MATCH, $PREMATCH and $POSTMATCH) anywhere in your code? If so, Perl will copy your entire string for every regular expression match, just in case you want to inspect those variables.

"In your code" in this case means indirectly, including using modules that reference these variables, or writing use English rather than use English qw( -no_match_vars ).

If you're not sure, you can use the Devel::SawAmpersand module to determine if they have been used, and Devel::FindAmpersand to figure out where they are used.

There may be other reasons for the increase in memory (which version of Perl are you using?), but the match variables will definitely blow your memory if they're used, and hence are a likely culprit.

Cheerio,

Paul

like image 147
pjf Avatar answered Nov 15 '22 05:11

pjf