Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to parse bbcode

I'd like to work on a bbcode filter for a php website. (I'm using cakephp, it would be a bbcode helper) I have some requirement.

Bbcodes can be nested. So something like that is valid.

[block]  
    [block]  
    [/block]  
    [block]  
        [block]  
        [/block]  
    [/block]  
[/block]  

Bbcodes can have 0 or more parameters.

Exemple:

[video: url="url", width="500", height="500"]Title[/video]

Bbcodes might have mutliple behaviours.

Let say, [url]text[/url] would be transformed to [url:url="text"]text[/url] or the video bbcode would be able to choose between youtube, dailymotion....

I think it cover most of my needs. I alreay done something with regex. But my biggest problem was to match parameters. In fact, I got nested bbcode to work and bbcode with 0 parameters. But when I added a regex match for parameters it didn't match nested bbcode correctly.

"\[($tag)(=.*)\"\](.*)\[\/\1\]" // It wasn't .* but the non-gready matcher

I don't have the complete regex with me right now, But I had something that looked like that(above).

So is there a way to match bbcode efficiently with regex or something else. The only thing I can think of is to use the visitor pattern and to split my text with each possible tags this way, I can have a bit more of control over my text parsing and I could probably validate my document so if the input text doesn't have valid bbcode. I could Notify the user with a error before saving anything.

I would use sablecc to create my text parser. http://sablecc.org/

Any better idea? or anything that could lead to a efficient flexible bbcode parser?

Thank you and sorry for my bad english...

like image 267
Loïc Faure-Lacroix Avatar asked Jan 28 '09 19:01

Loïc Faure-Lacroix


People also ask

Can you use HTML in BBCode?

(Both HTML color names and hexadecimal color values are generally supported, although on some boards, you must omit the # from selecting a hexadecimal color.)

What is BBCode and how do I use it?

Bulletin Board code (BBCode) is a lightweight markup language designed to let users format the text of their messages. It is used in many forums on the web, not just on websites created with Kentico. BBCode tags are similar to HTML tags and are entered in square brackets.

How do you write BBCode?

First you need to click A button at the bottom right corner > click BBCode button > type your text using BBCode tags. Click Send. As a result, the text is bolded and italicized.

What is BBCode for a link?

BBCode is a sort of simple syntax that you can use to format your entries with bold text, links, and more. It is very similar (in many cases identical) to the codes used on many popular forum packages. BBCode can be used in your entry fields when you create an entry or when you leave a comment.


3 Answers

There are several existing libraries for parsing BBCode, it may be easier to look into those than trying to roll your own:

Here's a couple, I'm sure there are more if you look around:
PECL bbcode
PEAR HTML_BBCodeParser

like image 65
Chad Birch Avatar answered Nov 12 '22 16:11

Chad Birch


Been looking into bbcode parsers myself. Most of them use regex and PHP4 and produce errors on PHP 5.2+ or don't work at all. PECL bbcode and PEAR HTML_BBCodeParser don't appear to be maintained any more (late 2012) and aren't easily installed on the shared hosting setup I have to work with. StringParser_BBCode works with some minor tweaks for 5.2+ but the method for adding new tags is clumsy, and it was last updated in 2008.

Buried on the 4th page of of a Bing search (I was getting desperate) I found jBBCode, which appears new and requires PHP 5.3. MIT Lisence. I have yet to try building custom tags, but so far it is the only one I've tried that works out of the box on a shared hosting account with PHP 5.3.

like image 21
Chris Currie Avatar answered Nov 12 '22 14:11

Chris Currie


There's both a pecl and PEAR BBCode parsing library. Software's hard enough without reinventing years of work on your own.

If neither of those are an option, I'd concentrate on turning the BBCode into a valid XML string, and then using your favorite XML parsing routine on that. Very very rough idea here, but

  1. Run the code through htmlspecialchars to escape any entities that need escaping

  2. Transform all [ and ] characters into < and > respectively

  3. Don't forget to account for the colon in cases like [tagname:

If the BBCode was nested properly, you should be all set to pass this string into an XML parsing object (SimpleXML, DOMDocument, etc.)

like image 6
Alan Storm Avatar answered Nov 12 '22 15:11

Alan Storm