Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does this regular expression work?

Tags:

regex

perl

primes

From this article,

/^1?$|^(11+?)\1+$/ checks whether a number(its value in unary) is prime or not.

Using this, perl -l -e '(1 x $_) !~ /^1?$|^(11+?)\1+$/ && print while ++$_;' returns a list of prime numbers.

I do not have enough experience with Perl, but what I understand is that the regular expression will be true for a number that is not prime. So, if we print all numbers that do not produce a true with this expression, we have a list of prime numbers. Thats what the perl query is trying to do.

About the regex part,

^1?$ part is for counting 1 as not prime

^(11+?)\1+$ is for matching not prime numbers starting from 4.


What I do not understand is why is the ? in the regex needed at all. According to me /^1$|^(11+)\1+$/ should be just fine and actually

perl -l -e '(1 x $_) !~ /^1$|^(11+)\1+$/ && print while ++$_;' gives me the same set of prime numbers.

Is there any flaw in my understanding of the regular expression? Why are the ?s needed?

Isn't ? supposed to match zero or one occurrence of the expression preceding it?

like image 848
Lazer Avatar asked Jul 25 '10 15:07

Lazer


People also ask

How does a regular expression work?

A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation.

What does ?= Mean in regex?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

What is regular expression explain using example?

A regular expression is a method used in programming for pattern matching. Regular expressions provide a flexible and concise means to match strings of text. For example, a regular expression could be used to search through large volumes of text and change all occurrences of "cat" to "dog".

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.


2 Answers

The first ? is for matching the empty string (i.e. 0) as non-prime. If you don't care whether the regexp matches 0, then it's not necessary.

The second ? is only for efficiency. + is normally "greedy", which means it matches as many characters as are available, and then backtracks if the rest of the regexp fails to match. The +? makes it non-greedy, so it matches only 1 character, and then tries matching more if the rest of the regexp fails to match. (See the Quantifiers section of perlre for more about greedy vs. non-greedy matching.)

In this particular regexp, the (11+?) means it tests divisibility by 2 ('11'), then 3 ('111'), then 4, etc. If you used (11+), it would test divisibility by N (the number itself), then N-1, then N-2, etc. Since a divisor must be no greater than N/2, without the ? it would waste time testing a lot of "potential" divisors that can't possibly work. It would still match non-prime numbers, just more slowly. (Also, $1 would be the largest divisor instead of the smallest one.)

like image 147
cjm Avatar answered Oct 23 '22 14:10

cjm


The first ? will make "" (the empty string, unary zero) not be a prime number. Zero is defined as not prime.

The second is different; it stops the regular expression from greedy-matching. It should improve the performance of the match greatly, since the first part of that section ((11+)) won't consume almost the entire string before having to backtrack. If you omit the question mark, you're effectively testing whether odd n is divisible by n-1 and so one down; if you include it, you're testing divisibility by two first and so on upwards. Obviously, numbers tend to be divisible by smaller factors more often, so your match will be faster.

like image 37
Borealid Avatar answered Oct 23 '22 14:10

Borealid