How does this regular expression work?

Tags:

From this article,

/^1?$|^(11+?)\1+$/ checks whether a number(its value in unary) is prime or not.

Using this, perl -l -e '(1 x $_) !~ /^1?$|^(11+?)\1+$/ && print while ++$_;' returns a list of prime numbers.

I do not have enough experience with Perl, but what I understand is that the regular expression will be true for a number that is not prime. So, if we print all numbers that do not produce a true with this expression, we have a list of prime numbers. Thats what the perl query is trying to do.

About the regex part,

^1?$ part is for counting 1 as not prime

^(11+?)\1+$ is for matching not prime numbers starting from 4.

What I do not understand is why is the ? in the regex needed at all. According to me /^1$|^(11+)\1+$/ should be just fine and actually

perl -l -e '(1 x $_) !~ /^1$|^(11+)\1+$/ && print while ++$_;' gives me the same set of prime numbers.

Is there any flaw in my understanding of the regular expression? Why are the ?s needed?

Isn't ? supposed to match zero or one occurrence of the expression preceding it?

848

asked Jul 25 '10 15:07

Lazer

2 Answers

The first ? is for matching the empty string (i.e. 0) as non-prime. If you don't care whether the regexp matches 0, then it's not necessary.

The second ? is only for efficiency. + is normally "greedy", which means it matches as many characters as are available, and then backtracks if the rest of the regexp fails to match. The +? makes it non-greedy, so it matches only 1 character, and then tries matching more if the rest of the regexp fails to match. (See the Quantifiers section of perlre for more about greedy vs. non-greedy matching.)

In this particular regexp, the (11+?) means it tests divisibility by 2 ('11'), then 3 ('111'), then 4, etc. If you used (11+), it would test divisibility by N (the number itself), then N-1, then N-2, etc. Since a divisor must be no greater than N/2, without the ? it would waste time testing a lot of "potential" divisors that can't possibly work. It would still match non-prime numbers, just more slowly. (Also, $1 would be the largest divisor instead of the smallest one.)

147

answered Oct 23 '22 14:10

cjm

The first ? will make "" (the empty string, unary zero) not be a prime number. Zero is defined as not prime.

The second is different; it stops the regular expression from greedy-matching. It should improve the performance of the match greatly, since the first part of that section ((11+)) won't consume almost the entire string before having to backtrack. If you omit the question mark, you're effectively testing whether odd n is divisible by n-1 and so one down; if you include it, you're testing divisibility by two first and so on upwards. Obviously, numbers tend to be divisible by smaller factors more often, so your match will be faster.

answered Oct 23 '22 14:10

Borealid

Related questions
                            
                                Replace a regular expression submatch using a function
                            
                                Stack Overflow in java regex
                            
                                Is there a way to negate a regular expression?
                            
                                Replace double quotes by quotation marks
                            
                                IIS Rewrite rule - ignore for localhost
                            
                                Inconsistency between sed and python regular expressions
                            
                                Should I use ^ and $ in html5 input regex pattern validation?
                            
                                String generation with regex like criteria
                            
                                Python library to parse regex into AST?
                            
                                input pattern='[a-zA-Z]' is not working in React application
                            
                                How to portably parse the (Unicode) degree symbol with regular expressions?
                            
                                /.?e.?/ matches entire string, rather than expected substring
                            
                                Finding longest regex match in Java?
                            
                                Haskell: Regular Expressions and Data.Text
                            
                                Identify type of regex pattern
                            
                                C# library for human readable pattern matching?
                            
                                Disable all checkstyle checks for a specific java package
                            
                                Regular expression to match unescaped special characters only
                            
                                Regex - match anything except specific string
                            
                                Positive/Negative lookahead with grep and perl

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does this regular expression work?

Tags:

regex

perl

primes

Lazer

People also ask

2 Answers

cjm

Borealid

Recent Activity

Donate For Us