From man perlre
:
The "*" quantifier is equivalent to "{0,}", the "+" quantifier to "{1,}", and the "?" quantifier to "{0,1}". n and m are limited to integral values less than a preset limit defined when perl is built. This is usually 32766 on the most common platforms. The actual limit can be seen in the error message generated by code such as this:
$_ **= $_ , / {$_} / for 2 .. 42;
Ay that's ugly - Isn't there some constant I can get instead?
Edit: As daxim pointed out (and perlretut hints towards) it might be that 32767 is a magical hardcoded number. A little searching in the Perl code goes a long way, but I'm not sure how to get to the next step and actually find out where the default reg_infty or REG_INFTY is actually set:
~/dev/perl-5.12.2
$ grep -ri 'reg_infty.*=' *
regexec.c: if (max != REG_INFTY && ST.count == max)
t/re/pat.t: $::reg_infty = $Config {reg_infty} // 32767;
t/re/pat.t: $::reg_infty_m = $::reg_infty - 1;
t/re/pat.t: $::reg_infty_p = $::reg_infty + 1;
t/re/pat.t: $::reg_infty_m = $::reg_infty_m; # Surpress warning.
Edit 2: DVK is of course right: It's define
d at compile time, and can probably be overridden only with REG_INFTY
.
$1 equals the text " brown ".
A \w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or _ , not a whole word. Use \w+ to match a string of Perl-identifier characters (which isn't the same as matching an English word).
In addition, Perl defines the following: \w Match a "word" character (alphanumeric plus "_") \W Match a non-word character \s Match a whitespace character \S Match a non-whitespace character \d Match a digit character \D Match a non-digit character.
Summary: there are 3 ways I can think of to find the limit: empirical, "matching Perl tests" and "theoretical".
Empirical:
eval {$_ **= $_ , / {$_} / for 2 .. 129};
# To be truly portable, the above should ideally loop forever till $@ is true.
$@ =~ /bigger than (-?\d+) /;
print "LIMIT: $1\n"'
This seems obvious enough that it doesn't require explanation.
Matches Perl tests:
Perl has a series of tests for regex, some of which (in pat.t
) deal with testing this max value. So, you can approximate that the max value computed in those tests is "good enough" and follow the test's logic:
use Config;
$reg_infty = $Config {reg_infty} // 2 ** 15 - 1; # 32767
print "Test-based reg_infinity limit: $reg_infty\n";
The explanation of where in the tests this is based off of is in below details.
Theoretical: This is attempting to replicate the EXACT logic used by C code to generate this value.
This is harder that it sounds, because it's affected by 2 things: Perl build configuration and a bunch of C #define
statements with branching logic. I was able to delve fairly deeply into that logic, but was stalled on two problems: the #ifdefs
reference a bunch of tokens that are NOT actually defined anywhere in Perl code that I can find - and I don't know how to find out from within Perl what those define
s values were, and the ultimate default value (assuming I'm right and those #ifdef
s always end up with the default) of #define PERL_USHORT_MAX ((unsigned short)~(unsigned)0)
(The actual limit is gotten by removing 1 bit off that resulting all-ones number - details below).
I'm also not sure how to access the amount of bytes in short
from Perl for whichever implementation was used to build perl
executable.
So, even if the answer to both those questions can be found (which I'm not sure of), the resulting logic would most certainly be "uglier" and more complex than the straightforward "empirical eval-based" one I offered as the first option.
Below I will provide the details of where various bits and pieces of logic related to to this limit live in Perl code, as well as my attempts to arrive at "Theoretically correct" solution matching C logic.
OK, here is some investigation part way, you can complete it yourself as I have ti run or I will complete later:
From regcomp.c
: vFAIL2("Quantifier in {,} bigger than %d", REG_INFTY - 1);
So, the limit is obviously taken from REG_INFTY
define. Which is declared in:
rehcomp.h
:
/* XXX fix this description.
Impose a limit of REG_INFTY on various pattern matching operations
to limit stack growth and to avoid "infinite" recursions.
*/
/* The default size for REG_INFTY is I16_MAX, which is the same as
SHORT_MAX (see perl.h). Unfortunately I16 isn't necessarily 16 bits
(see handy.h). On the Cray C90, sizeof(short)==4 and hence I16_MAX is
((1<<31)-1), while on the Cray T90, sizeof(short)==8 and I16_MAX is
((1<<63)-1). To limit stack growth to reasonable sizes, supply a
smaller default.
--Andy Dougherty 11 June 1998
*/
#if SHORTSIZE > 2
# ifndef REG_INFTY
# define REG_INFTY ((1<<15)-1)
# endif
#endif
#ifndef REG_INFTY
# define REG_INFTY I16_MAX
#endif
Please note that SHORTSIZE is overridable via Config
- I will leave details of that out but the logic will need to include $Config{shortsize}
:)
From handy.h (this doesn't seem to be part of Perl source at first glance so it looks like an iffy step):
#if defined(UINT8_MAX) && defined(INT16_MAX) && defined(INT32_MAX)
#define I16_MAX INT16_MAX
#else
#define I16_MAX PERL_SHORT_MAX
I could not find ANY place which defined INT16_MAX
at all :(
Someone help please!!!
PERL_SHORT_MAX is defined in perl.h:
#ifdef SHORT_MAX
# define PERL_SHORT_MAX ((short)SHORT_MAX)
#else
# ifdef MAXSHORT /* Often used in <values.h> */
# define PERL_SHORT_MAX ((short)MAXSHORT)
# else
# ifdef SHRT_MAX
# define PERL_SHORT_MAX ((short)SHRT_MAX)
# else
# define PERL_SHORT_MAX ((short) (PERL_USHORT_MAX >> 1))
# endif
# endif
#endif
I wasn't able to find any place which defined SHORT_MAX, MAXSHORT or SHRT_MAX so far. So the default of ((short) (PERL_USHORT_MAX >> 1))
it is assumed to be for now :)
PERL_USHORT_MAX is defined very similarly in perl.h
, and again I couldn't find a trace of definition of USHORT_MAX
/MAXUSHORT
/USHRT_MAX
.
Which seems to imply that it's set by default to: #define PERL_USHORT_MAX ((unsigned short)~(unsigned)0)
. How to extract that value from Perl side, I have no clue - it's basically a number you get by bitwise negating a short 0, so if unsigned short is 16 bytes, then PERL_USHORT_MAX
will be 16 ones, and PERL_SHORT_MAX
will be 15 ones, e.g. 2^15-1, e.g. 32767.
Also, from t/re/pat.t
(regex tests): $::reg_infty = $Config {reg_infty} // 32767;
(to illustrate where the non-default compiled in value is stored).
So, to get your constant, you do:
use Config;
my $shortsize = $Config{shortsize} // 2;
$c_reg_infty = (defined $Config {reg_infty}) ? $Config {reg_infty}
: ($shortsize > 2) ? 2**16-1
: get_PERL_SHORT_MAX();
# Where get_PERL_SHORT_MAX() depends on logic for PERL_SHORT_MAX in perl.h
# which I'm not sure how to extract into Perl with any precision
# due to a bunch of never-seen "#define"s and unknown size of "short".
# You can probably do fairly well by simply returning 2**8-1 if shortsize==1
# and 2^^16-1 otherwise.
say "REAL reg_infinity based on C headers: $c_reg_infty";
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With