Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lazy (ungreedy) matching multiple groups using regex

I would like to grab the contents of any value between pairs of <tag></tag> tags.

<tag>
This is one block of text
</tag>

<tag>
This is another one
</tag>

The regex I have come up with is

/<tag>(.*)</tag>/m

Though, it appears to be greedy and is capturing everything within the enclosed parentheses up until the very last </tag>. I would like it to be as lazy as possible so that everytime it sees a closing tag, it will treat that as a match group and start over.

How can I write the regex so that I will be able to get multiple matches in the given scenario?

I have included a sample of what I am describing in the following link

http://rubular.com/r/JW5M3rnqIE

Note: This is not XML, nor is it really based on any existing standard format. I won't need anything sophisticated like a full-fledged library that comes with a nice parser.

like image 370
MxLDevs Avatar asked Oct 14 '12 18:10

MxLDevs


People also ask

What is lazy matching in regex?

'Lazy' means match shortest possible string. For example, the greedy h. +l matches 'hell' in 'hello' but the lazy h.

How do I make a regex not greedy?

backing up until it can match an 'ab' (this is called backtracking). To make the quantifier non-greedy you simply follow it with a '?' the first 3 characters and then the following 'ab' is matched.

What is non capturing group in regex?

tl;dr non-capturing groups, as the name suggests are the parts of the regex that you do not want to be included in the match and ?: is a way to define a group as being non-capturing.

How do I match a pattern in regex?

2.1 Matching a Single Character The fundamental building blocks of a regex are patterns that match a single character. Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" .


1 Answers

Go with regex pattern:

/<tag>(.*?)<\/tag>/im

Lazy (non-greedy) is .*?, not .*.

To find multiple occurrences, use:

string.scan(/<tag>(.*?)<\/tag>/im) 
like image 100
Ωmega Avatar answered Oct 05 '22 01:10

Ωmega