perlre length limit

Tags:

perl

From man perlre:

The "*" quantifier is equivalent to "{0,}", the "+" quantifier to "{1,}", and the "?" quantifier to "{0,1}". n and m are limited to integral values less than a preset limit defined when perl is built. This is usually 32766 on the most common platforms. The actual limit can be seen in the error message generated by code such as this:
       $_ **= $_ , / {$_} / for 2 .. 42;

Ay that's ugly - Isn't there some constant I can get instead?

Edit: As daxim pointed out (and perlretut hints towards) it might be that 32767 is a magical hardcoded number. A little searching in the Perl code goes a long way, but I'm not sure how to get to the next step and actually find out where the default reg_infty or REG_INFTY is actually set:

~/dev/perl-5.12.2
$ grep -ri 'reg_infty.*=' *
regexec.c:      if (max != REG_INFTY && ST.count == max)
t/re/pat.t:        $::reg_infty   = $Config {reg_infty} // 32767;
t/re/pat.t:        $::reg_infty_m = $::reg_infty - 1;
t/re/pat.t:        $::reg_infty_p = $::reg_infty + 1;
t/re/pat.t:        $::reg_infty_m = $::reg_infty_m;   # Surpress warning.

Edit 2: DVK is of course right: It's defined at compile time, and can probably be overridden only with REG_INFTY.

762

asked Jan 04 '11 10:01

l0b0

1 Answers

Summary: there are 3 ways I can think of to find the limit: empirical, "matching Perl tests" and "theoretical".

Empirical:

eval {$_ **= $_ , / {$_} / for 2 .. 129};
# To be truly portable, the above should ideally loop forever till $@ is true.
$@ =~ /bigger than (-?\d+) /; 
print "LIMIT: $1\n"'

This seems obvious enough that it doesn't require explanation.

Matches Perl tests:

Perl has a series of tests for regex, some of which (in pat.t) deal with testing this max value. So, you can approximate that the max value computed in those tests is "good enough" and follow the test's logic:
```
use Config;
$reg_infty = $Config {reg_infty} // 2 ** 15 - 1; # 32767
print "Test-based reg_infinity limit: $reg_infty\n";
```
The explanation of where in the tests this is based off of is in below details.
Theoretical: This is attempting to replicate the EXACT logic used by C code to generate this value.

This is harder that it sounds, because it's affected by 2 things: Perl build configuration and a bunch of C #define statements with branching logic. I was able to delve fairly deeply into that logic, but was stalled on two problems: the #ifdefs reference a bunch of tokens that are NOT actually defined anywhere in Perl code that I can find - and I don't know how to find out from within Perl what those defines values were, and the ultimate default value (assuming I'm right and those #ifdefs always end up with the default) of #define PERL_USHORT_MAX ((unsigned short)~(unsigned)0) (The actual limit is gotten by removing 1 bit off that resulting all-ones number - details below).

I'm also not sure how to access the amount of bytes in short from Perl for whichever implementation was used to build perl executable.

So, even if the answer to both those questions can be found (which I'm not sure of), the resulting logic would most certainly be "uglier" and more complex than the straightforward "empirical eval-based" one I offered as the first option.

Below I will provide the details of where various bits and pieces of logic related to to this limit live in Perl code, as well as my attempts to arrive at "Theoretically correct" solution matching C logic.

OK, here is some investigation part way, you can complete it yourself as I have ti run or I will complete later:

From regcomp.c: vFAIL2("Quantifier in {,} bigger than %d", REG_INFTY - 1);

So, the limit is obviously taken from REG_INFTY define. Which is declared in:

rehcomp.h:

 /* XXX fix this description.
    Impose a limit of REG_INFTY on various pattern matching operations
    to limit stack growth and to avoid "infinite" recursions.
 */
 /* The default size for REG_INFTY is I16_MAX, which is the same as
    SHORT_MAX (see perl.h).  Unfortunately I16 isn't necessarily 16 bits
    (see handy.h).  On the Cray C90, sizeof(short)==4 and hence I16_MAX is
    ((1<<31)-1), while on the Cray T90, sizeof(short)==8 and I16_MAX is
    ((1<<63)-1).  To limit stack growth to reasonable sizes, supply a
    smaller default.
         --Andy Dougherty  11 June 1998
 */
 #if SHORTSIZE > 2
 #  ifndef REG_INFTY
 #    define REG_INFTY ((1<<15)-1)
 #  endif
 #endif
 #ifndef REG_INFTY
 #  define REG_INFTY I16_MAX
 #endif

Please note that SHORTSIZE is overridable via Config - I will leave details of that out but the logic will need to include $Config{shortsize} :)

From handy.h (this doesn't seem to be part of Perl source at first glance so it looks like an iffy step):

 #if defined(UINT8_MAX) && defined(INT16_MAX) && defined(INT32_MAX)
 #define I16_MAX INT16_MAX
 #else
 #define I16_MAX PERL_SHORT_MAX

I could not find ANY place which defined INT16_MAX at all :(

Someone help please!!!

PERL_SHORT_MAX is defined in perl.h:

 #ifdef SHORT_MAX
 #  define PERL_SHORT_MAX ((short)SHORT_MAX)
 #else
 #  ifdef MAXSHORT    /* Often used in <values.h> */
 #    define PERL_SHORT_MAX ((short)MAXSHORT)
 #  else
 #    ifdef SHRT_MAX
 #      define PERL_SHORT_MAX ((short)SHRT_MAX)
 #    else
 #      define PERL_SHORT_MAX      ((short) (PERL_USHORT_MAX >> 1))
 #    endif
 #  endif
 #endif

I wasn't able to find any place which defined SHORT_MAX, MAXSHORT or SHRT_MAX so far. So the default of ((short) (PERL_USHORT_MAX >> 1)) it is assumed to be for now :)

PERL_USHORT_MAX is defined very similarly in perl.h, and again I couldn't find a trace of definition of USHORT_MAX/MAXUSHORT/USHRT_MAX.

Which seems to imply that it's set by default to: #define PERL_USHORT_MAX ((unsigned short)~(unsigned)0). How to extract that value from Perl side, I have no clue - it's basically a number you get by bitwise negating a short 0, so if unsigned short is 16 bytes, then PERL_USHORT_MAX will be 16 ones, and PERL_SHORT_MAX will be 15 ones, e.g. 2^15-1, e.g. 32767.
Also, from t/re/pat.t (regex tests): $::reg_infty = $Config {reg_infty} // 32767; (to illustrate where the non-default compiled in value is stored).

So, to get your constant, you do:

use Config;
my $shortsize = $Config{shortsize} // 2;
$c_reg_infty = (defined $Config {reg_infty}) ? $Config {reg_infty}
                                             : ($shortsize > 2) ? 2**16-1
                                             : get_PERL_SHORT_MAX();
# Where get_PERL_SHORT_MAX() depends on logic for PERL_SHORT_MAX in perl.h
# which I'm not sure how to extract into Perl with any precision
# due to a bunch of never-seen "#define"s and unknown size of "short".
# You can probably do fairly well by simply returning 2**8-1 if shortsize==1 
# and 2^^16-1 otherwise.
say "REAL reg_infinity based on C headers: $c_reg_infty";

116

answered Oct 19 '22 04:10

DVK

Related questions
                            
                                Replace MULTIPLE first keys of multi-dimensional Hash in perl
                            
                                Perl Globbing a Variable stops on first match
                            
                                Is there a way to have dependencies defined by OS in a CPAN distribution?
                            
                                Monitoring Windows directory size
                            
                                How do I track down a mod_perl memory leak?
                            
                                Get Link Speed - Win32_PerfRawData_Tcpip_NetworkInterface
                            
                                How do I fix "perl is not recognized" on Windows?
                            
                                Does using undef as hash values save any memory in Perl?
                            
                                running BLAST (bl2seq) without creating sequence files
                            
                                Java equivalent of Perl's s/// operator?
                            
                                How do I group my package imports into a single custom package?
                            
                                Getting list of fields back from 'use fields' pragma?
                            
                                How can I generate all possible permutations from a Perl regular expression?
                            
                                Which tool should I use for finding out my memory allocation in Perl?
                            
                                Lexically importing useful functions in a big script
                            
                                What is the Java equivalent of Perl's qq operator?
                            
                                How can I control the Perl version used when submitting grid jobs?
                            
                                How can I use a Perl module from Python?
                            
                                lighttpd + perl + mojolicious =?
                            
                                Perl LWP::UserAgent mishandling UTF-8 response

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

perlre length limit

Tags:

regex

perl

l0b0

People also ask

1 Answers

DVK

Recent Activity

Donate For Us