Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to neatly match "x" and "[x]" with a regex without repeating?

I'm writing a Perl regex to match both the strings x bla and [x] bla. One alternative is /(?:x|\[x\]) bla/. This isn't desirable, because in the real world, x is more complicated, so I want to avoid repeating it.

The best solution so far is putting x in a variable and pre-compiling the regex:

my $x = 'x';
my $re = qr/(?:$x|\[$x\]) bla/o;

Is there a neater solution? In this case, readability is more important than performance.

like image 626
Tim Avatar asked Jun 26 '11 16:06

Tim


People also ask

Does * match everything in regex?

Throw in an * (asterisk), and it will match everything. Read more. \s (whitespace metacharacter) will match any whitespace character (space; tab; line break; ...), and \S (opposite of \s ) will match anything that is not a whitespace character.

What is the difference between * and * in regex?

represents a single character (like the regex's . ) while * represents a sequence of zero or more characters (equivalent to regex . * ).

How do you literally match in regex?

If you want your regex to match them literally, you need to escape them by placing a backslash in front of them. Thus, the regex: ‹ \$\(\)\*\+\.

How do I match any character across multiple lines in a regular expression?

The dot matches all except newlines (\r\n). So use \s\S, which will match ALL characters.


3 Answers

It's possible, but not all that clean. You can use the fact that conditional subpatterns support tests such as (?(N)) to check that the Nth capturing subpattern successfully matched. So you can use an expression such as /(\[)?X(?(1)\])/ to match '[X]' or 'X'.

like image 185
jaytea Avatar answered Nov 11 '22 03:11

jaytea


You can pre-compile $x as well. This also makes errors a little more obvious if $x is really ?(+[*{ or something else that a regex compiler will completely freak out on.

my $x = qr/x/;
my $re = qr/(?:$x|\[$x\]) bla/o;
like image 37
robert Avatar answered Nov 11 '22 03:11

robert


There isn't a neater solution really, because this is where we leave the domain of regular languages and start requiring a more complex automaton with some kind of memory. (Backrefs would do it, except that the backref expands to a literal match against a preceding part of the string, not to “this, but only if that was matched”.)

Sometimes, it's possible to instead use a two step process, replacing a complex X with a single character known to not be present in the source text (control characters can be suitable for that) so allowing a simpler second-stage match. Not always possible though; depends on what you're matching.

like image 37
Donal Fellows Avatar answered Nov 11 '22 03:11

Donal Fellows