Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I fix this multiline regular expression in Ruby?

I have a regular expression in Ruby that isn't working properly in multiline mode.

I'm trying to convert Markdown text into the Textile-eque markup used in Redmine. The problem is in my regular expression for converting code blocks. It should find any lines leading with 4 spaces or a tab, then wrap them in pre tags.

markdownText = '# header

some text that precedes code

    var foo = 9;
    var fn = function() {}

    fn();

some post text'

puts markdownText.gsub!(/(^(?:\s{4}|\t).*?$)+/m,"<pre>\n\\1\n</pre>")

Intended result:

# header

some text that precedes code

<pre>
    var foo = 9;
    var fn = function() {}

    fn();
</pre>

some post text

The problem is that the closing pre tag is printed at the end of the document instead of after "fn();". I tried some variations of the following expression but it doesn't match:

gsub!(/(^(?:\s{4}|\t).*?$)+^(\S)/m, "<pre>\n\\1\n</pre>\\2")

How do I get the regular expression to match just the indented code block? You can test this regular expression on Rubular here.

like image 313
DonovanChan Avatar asked Apr 19 '11 16:04

DonovanChan


People also ask

What is multiline mode in regex?

Multiline option, it matches either the newline character ( \n ) or the end of the input string.

What does =~ mean in Ruby regex?

=~ is Ruby's basic pattern-matching operator. When one operand is a regular expression and the other is a string then the regular expression is used as a pattern to match against the string. (This operator is equivalently defined by Regexp and String so the order of String and Regexp do not matter.


1 Answers

First, note that 'm' multi-line mode in Ruby is equivalent to 's' single-line mode of other languages. In other words; 'm' mode in Ruby means: "dot matches all".

This regex will do a pretty good job of matching a markdown-like code section:

re = / # Match a MARKDOWN CODE section.
    (\r?\n)              # $1: CODE must be preceded by blank line
    (                    # $2: CODE contents
      (?:                # Group for multiple lines of code.
        (?:\r?\n)+       # Each line preceded by a newline,
        (?:[ ]{4}|\t).*  # and begins with four spaces or tab.
      )+                 # One or more CODE lines
      \r?\n              # CODE folowed by blank line.
    )                    # End $2: CODE contents
    (?=\r?\n)            # CODE folowed by blank line.
    /x
result = subject.gsub(re, '\1<pre>\2</pre>')

This requires a blank line before and after the code section and allows blank lines within the code section itself. It allows for either \r\n or \n line terminations. Note that this does not strip the leading 4 spaces (or tab) before each line. Doing that will require more code complexity. (I am not a ruby guy so can't help out with that.)

I would recommend looking at the markdown source itself to see how its really being done.

like image 179
ridgerunner Avatar answered Nov 04 '22 01:11

ridgerunner