Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to convert mark down to HTML

How would you write a regular expression to convert mark down into HTML? For example, you would type in the following:

This would be *italicized* text and this would be **bold** text

This would then need to be converted to:

This would be <em>italicized</em> text and this would be <strong>bold</strong> text

Very similar to the mark down edit control used by stackoverflow.

Clarification

For what it is worth, I am using C#. Also, these are the only real tags/markdown that I want to allow. The amount of text being converted would be less than 300 characters or so.

like image 470
mattruma Avatar asked Sep 21 '08 10:09

mattruma


3 Answers

The best way is to find a version of the Markdown library ported to whatever language you are using (you did not specify in your question).


Now that you have clarified that you only want STRONG and EM to be processed, and that you are using C#, I recommend you take a look at Markdown.NET to see how those tags are implemented. As you can see, it is in fact two expressions. Here is the code:

private string DoItalicsAndBold (string text)
{
    // <strong> must go first:
    text = Regex.Replace (text, @"(\*\*|__) (?=\S) (.+?[*_]*) (?<=\S) \1", 
                          new MatchEvaluator (BoldEvaluator),
                          RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline);

    // Then <em>:
    text = Regex.Replace (text, @"(\*|_) (?=\S) (.+?) (?<=\S) \1",
                          new MatchEvaluator (ItalicsEvaluator),
                          RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline);
    return text;
}

private string ItalicsEvaluator (Match match)
{
    return string.Format ("<em>{0}</em>", match.Groups[2].Value);
}

private string BoldEvaluator (Match match)
{
    return string.Format ("<strong>{0}</strong>", match.Groups[2].Value);
}
like image 109
Tim Booker Avatar answered Oct 14 '22 10:10

Tim Booker


A single regex won't do. Every text markup will have it's own html translator. Better look into how the existing converters are implemented to get an idea on how it works.

http://en.wikipedia.org/wiki/Markdown#See_also

like image 45
jop Avatar answered Oct 14 '22 10:10

jop


I don't know about C# specifically, but in perl it would be:

\\\*\\\*(.*?)\\\*\\\*/
\< bold\>$1\<\/bold\>/g
\\\*(.\*?)\\\*/
\< em\>$1\<\/em\>/g
like image 1
tloach Avatar answered Oct 14 '22 11:10

tloach