From this article,
/^1?$|^(11+?)\1+$/
checks whether a number(its value in unary) is prime or not.
Using this, perl -l -e '(1 x $_) !~ /^1?$|^(11+?)\1+$/ && print while ++$_;'
returns a list of prime numbers.
I do not have enough experience with Perl, but what I understand is that the regular expression will be true for a number that is not prime. So, if we print all numbers that do not produce a true with this expression, we have a list of prime numbers. Thats what the perl query is trying to do.
About the regex part,
^1?$
part is for counting 1 as not prime
^(11+?)\1+$
is for matching not prime numbers starting from 4.
What I do not understand is why is the ?
in the regex needed at all.
According to me /^1$|^(11+)\1+$/
should be just fine and actually
perl -l -e '(1 x $_) !~ /^1$|^(11+)\1+$/ && print while ++$_;'
gives me the same set of prime numbers.
Is there any flaw in my understanding of the regular expression? Why are the ?
s needed?
Isn't ?
supposed to match zero or one occurrence of the expression preceding it?
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation.
?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).
A regular expression is a method used in programming for pattern matching. Regular expressions provide a flexible and concise means to match strings of text. For example, a regular expression could be used to search through large volumes of text and change all occurrences of "cat" to "dog".
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.
The first ?
is for matching the empty string (i.e. 0) as non-prime. If you don't care whether the regexp matches 0, then it's not necessary.
The second ?
is only for efficiency. +
is normally "greedy", which means it matches as many characters as are available, and then backtracks if the rest of the regexp fails to match. The +?
makes it non-greedy, so it matches only 1 character, and then tries matching more if the rest of the regexp fails to match. (See the Quantifiers section of perlre for more about greedy vs. non-greedy matching.)
In this particular regexp, the (11+?)
means it tests divisibility by 2 ('11'
), then 3 ('111'
), then 4, etc. If you used (11+)
, it would test divisibility by N (the number itself), then N-1, then N-2, etc. Since a divisor must be no greater than N/2, without the ?
it would waste time testing a lot of "potential" divisors that can't possibly work. It would still match non-prime numbers, just more slowly. (Also, $1
would be the largest divisor instead of the smallest one.)
The first ?
will make "" (the empty string, unary zero) not be a prime number. Zero is defined as not prime.
The second is different; it stops the regular expression from greedy-matching. It should improve the performance of the match greatly, since the first part of that section ((11+)
) won't consume almost the entire string before having to backtrack. If you omit the question mark, you're effectively testing whether odd n
is divisible by n-1
and so one down; if you include it, you're testing divisibility by two first and so on upwards. Obviously, numbers tend to be divisible by smaller factors more often, so your match will be faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With