Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using regex to match any character until a substring is reached?

Tags:

c#

regex

vb.net

I'd like to be able to match a specific sequence of characters, starting with a particular substring and ending with a particular substring. My positive lookahead regex works if there is only one instance to match on a line, but not if there should be multiple matches on a line. I understand this is because (.+) captures up everything until the last positive lookahead expression is found. It'd be nice if it would capture everything until the first expression is found.

Here is my regex attempt:

@@FOO\[(.*)(?=~~)~~(.*)(?=\]@@)\]@@

Sample input:

@@FOO[abc~~hi]@@    @@FOO[def~~hey]@@

Desired output: 2 matches, with 2 matching groups each (abc, hi) and (def, hey).

Actual output: 1 match with 2 groups (abc~~hi]@@ @@FOO[def, hey)

Is there a way to get the desired output?

Thanks in advance!

like image 534
joshm1 Avatar asked Jul 18 '11 19:07

joshm1


People also ask

How do you match a character sequence in regex?

Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "=" ; @ matches "@" .

What does \b mean in regex?

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length. There are three different positions that qualify as word boundaries: Before the first character in the string, if the first character is a word character.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

What is the difference between \b and \b in regular expression?

appears on your color - coded pass-key. Using regex \B-\B matches - between the word color - coded . Using \b-\b on the other hand matches the - in nine-digit and pass-key . How come in the first example we use \b to separate cat and in the second use \B to separate - ?


2 Answers

Use the question mark, it will match as few times as possible.

@@FOO\[(.*?)(?=~~)~~(.*?)(?=\]@@)\]@@

This one also works but is not as strict although easier to read

@@FOO\[(.*?)~~(.*?)\]@@
like image 104
Chris Haas Avatar answered Oct 28 '22 22:10

Chris Haas


The * operator is greedy by default, meaning it eats up as much of the string as possible while still leaving enough to match the remaining regex. You can make it not greedy by appending a ? to it. Make sure to read about the differences at the link.

like image 24
Ryan Stewart Avatar answered Oct 28 '22 23:10

Ryan Stewart