Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex capture every occurrence of a word within two delimiters

Tags:

regex

Say I have a long string of text, and I want to capture every time the word this is mentioned within rounded brackets. How could I do that? The following pattern only matches the first this, ignoring every occurrence after:

/\(.*(this).*\)/g

For example, using the pattern above on the following text:

Etiam scelerisque, nunc ac egestas consequat, (odio this nibh euismod nulla, eget auctor orci nibh vel this nisi. Aliquam this erat volutpat).

Will only return the first this after the word odio.

What am I doing wrong?

like image 899
Globalz Avatar asked Jul 25 '10 02:07

Globalz


2 Answers

First off, don't be greedy.

/\(.*?(this).*?\)/g

Secondly, if you're aiming to count the number of occurrences of 'this', a regex is probably not the right tool here. The problem is that you need to match the closing delimiter to determine that the first 'this' is enclosed, which means that continuing to apply the regex will not match anything inside that already-consumed set of delimiters.

The regex I have above will catch things like:

foo (baz this bar) (foo this)

But not (it will only match twice, once for each set of delimiters):

foo (this this bar) baz (this this this)

Try using a simple single-pass scanner instead of a regex. Another alternative is to use two regular expressions, one to separate the string into enclosed and non-enclosed sections, and another to search within the enclosed regions.

like image 90
Borealid Avatar answered Oct 14 '22 12:10

Borealid


the use of .* is going to match every single character in your search string. So what you're actually doing here is greedily matching everything before and after the first occurrence of this found within parentheses. Your current match results probably look a little bit like the following:

["(odio this nibh euismod nulla, eget auctor orci nibh vel this nisi. Aliquam this erat volutpat)", "this"]

Where the first item in the array is the entire substring matched by the expression, and everything that follows are your regex's captured values.

If you want to match every occurrence of this inside the parentheses, one solution would be to first get a substring of everything inside the parentheses, then search for this in that substring:

# Match everything inside the parentheses
/\([^\)]*\)/

# Match all occurrences of the word 'this' inside a substring
/this/g
like image 23
Chris Avatar answered Oct 14 '22 10:10

Chris