Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I make match stop on first occurance?

I need to digest some bbcode with a Ruby regular expression.

I have to delimit the elements with the match command and use a regexp /pattern/m to get rid of newlines.

For example, my bbcode in a string is:

s="[b]Title[/b] \n Article text \n [b]references[/b]"

Then I use match to delimit the parts of the text, especially the Title and the References parts which are enclosed between [b] and [/b]:

t=s.match(/\[b\](.*)\[\/b\]/m)

I use (..) syntax to catch a string in the regexp and I use \ to escape the special [ and ] characters. /m is to get rid of newlines in the string.

Then t[1] contains:

"Title[/b] \n Artucle text \n [b]references"

instead of "Title". because the match doesn't stop at the first occurance of [/b]. And t[2] is nil instead of "References" for the same reason.

How can I delimit the text parts enclosed between the usual bbcode tags?

like image 713
Konstantin Avatar asked Dec 26 '22 02:12

Konstantin


2 Answers

Use non-greedy operator ? like this:

t=s.match(/[b](.*?)[/b]/m)
like image 112
Sergey Bolgov Avatar answered Dec 29 '22 10:12

Sergey Bolgov


If you are sure you will not encounter opening square brackets between your bbcode tags, you can use a character class that excludes them:

t=s.match(/\[b\]([^\[]*)\[\/b\]/)

But if your [b] tags can contain other tags, you need to use a recursive pattern:

t=s.match(/(?x)
    # definitions
    (?<tag> \[ (?<name> \w++ ) [^\]]* \]
            (?> [^\[]+ | \g<tag> )*
            \[\/\g<name>\]
    ){0}

    # main pattern
    \[b\] (?<content> (?> [^\[]+ | \g<tag> )* ) \[\/b\]
          /)

And if you have to deal with self closing tags:

t=s.match(/(?x)
    # definitions
    (?<self> \[ (?:img|hr)\b [^\]]* \] ){0}
    (?<tag> \[ (?<name> \w++ ) [^\]]* \]
            (?> [^\[]+ | \g<self> | \g<tag> )*
            \[\/\g<name>\]
    ){0}

    # main pattern
    \[b\] (?<content> (?> [^\[]+ | \g<self> | \g<tag> )* ) \[\/b\]
          /)

Note: the {0} allows to define named subpatterns that can be used later without matching anything.

like image 20
Casimir et Hippolyte Avatar answered Dec 29 '22 11:12

Casimir et Hippolyte