I need to digest some bbcode with a Ruby regular expression.
I have to delimit the elements with the match
command and use a regexp /pattern/m
to get rid of newlines.
For example, my bbcode in a string is:
s="[b]Title[/b] \n Article text \n [b]references[/b]"
Then I use match
to delimit the parts of the text, especially the Title and the References parts which are enclosed between [b]
and [/b]
:
t=s.match(/\[b\](.*)\[\/b\]/m)
I use (..)
syntax to catch a string in the regexp and I use \
to escape the special [
and ]
characters. /m
is to get rid of newlines in the string.
Then t[1]
contains:
"Title[/b] \n Artucle text \n [b]references"
instead of "Title"
. because the match doesn't stop at the first occurance of [/b]
. And t[2]
is nil instead of "References" for the same reason.
How can I delimit the text parts enclosed between the usual bbcode tags?
Use non-greedy operator ?
like this:
t=s.match(/[b](.*?)[/b]/m)
If you are sure you will not encounter opening square brackets between your bbcode tags, you can use a character class that excludes them:
t=s.match(/\[b\]([^\[]*)\[\/b\]/)
But if your [b]
tags can contain other tags, you need to use a recursive pattern:
t=s.match(/(?x)
# definitions
(?<tag> \[ (?<name> \w++ ) [^\]]* \]
(?> [^\[]+ | \g<tag> )*
\[\/\g<name>\]
){0}
# main pattern
\[b\] (?<content> (?> [^\[]+ | \g<tag> )* ) \[\/b\]
/)
And if you have to deal with self closing tags:
t=s.match(/(?x)
# definitions
(?<self> \[ (?:img|hr)\b [^\]]* \] ){0}
(?<tag> \[ (?<name> \w++ ) [^\]]* \]
(?> [^\[]+ | \g<self> | \g<tag> )*
\[\/\g<name>\]
){0}
# main pattern
\[b\] (?<content> (?> [^\[]+ | \g<self> | \g<tag> )* ) \[\/b\]
/)
Note: the {0}
allows to define named subpatterns that can be used later without matching anything.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With