I would like to grab the contents of any value between pairs of <tag></tag>
tags.
<tag>
This is one block of text
</tag>
<tag>
This is another one
</tag>
The regex I have come up with is
/<tag>(.*)</tag>/m
Though, it appears to be greedy and is capturing everything within the enclosed parentheses up until the very last </tag>
. I would like it to be as lazy as possible so that everytime it sees a closing tag, it will treat that as a match group and start over.
How can I write the regex so that I will be able to get multiple matches in the given scenario?
I have included a sample of what I am describing in the following link
http://rubular.com/r/JW5M3rnqIE
Note: This is not XML, nor is it really based on any existing standard format. I won't need anything sophisticated like a full-fledged library that comes with a nice parser.
'Lazy' means match shortest possible string. For example, the greedy h. +l matches 'hell' in 'hello' but the lazy h.
backing up until it can match an 'ab' (this is called backtracking). To make the quantifier non-greedy you simply follow it with a '?' the first 3 characters and then the following 'ab' is matched.
tl;dr non-capturing groups, as the name suggests are the parts of the regex that you do not want to be included in the match and ?: is a way to define a group as being non-capturing.
2.1 Matching a Single Character The fundamental building blocks of a regex are patterns that match a single character. Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" .
Go with regex pattern:
/<tag>(.*?)<\/tag>/im
Lazy (non-greedy) is .*?
, not .*
.
To find multiple occurrences, use:
string.scan(/<tag>(.*?)<\/tag>/im)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With