Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java regex match markdown syntax for headings

I have a string with markdown syntax in it, and I want to be able to find markdown syntax for headings, i.e h1 = #, h2 = ## etc etc.

I know that whenever I find a heading, it is at the start of the line. I also know there can only be one heading per line. So for example, "###This is a heading" would match true for my h3 pattern, but not for my h2 or h1 patterns. This is my code so far:

h1 = Pattern.compile("(?<!\\#)^\\#(\\b)*");
h2 = Pattern.compile("(?<!\\#)^\\#{2}(\\b)*");
h3 = Pattern.compile("(?<!\\#)^\\#{3}(\\b)*");
h4 = Pattern.compile("(?<!\\#)^\\#{4}(\\b)*");
h5 = Pattern.compile("(?<!\\#)^\\#{5}(\\b)*");
h6 = Pattern.compile("(?<!\\#)^\\#{6}(\\b)*");

Whenever I use \\#, my compiler (IntelliJ) tells me: "Redundant character escape". It does that whenever I use \\#. As far as I know, # should not be a special character in regex, so escaping it with two backslashes should allow me to use it.

When I find a match, I want to surrond the entire match with bold HTML-tags, like this: "###Heading", but for some reason it's not working

//check for heading 6
Matcher match = h6.matcher(tmp);
StringBuffer sb = new StringBuffer();
while (match.find()) {
    match.appendReplacement(sb, "<b>" + match.group(0) + "</b>");
}
match.appendTail(sb);
tmp = sb.toString();

EDIT

So I have to seperately look at each heading, I can't look at heading 1-6 in the same pattern (this has to do with other parts of my program that uses the same pattern). What I know so far:

  • If there is a heading in the string, it is at the start.
  • If it starts with a heading, the entire string that follows is considered a heading, until the user presses Enter.
  • If I have "## This a heading", then it must match true for h2, but false for h1.
  • When I find my match, this "## This a heading" becomes this "## This a heading.
like image 768
Kaffemakarn Avatar asked May 22 '17 08:05

Kaffemakarn


2 Answers

There is no need to escape # since it is not a special regex metacharacter. Also, the ^ is the string start anchor, so all the lookbehinds in your patterns are redundant as they always return true (since there is no character before the beginning of a string).

You seem to want to match a specified number of # before a word char. Use

String s = "###### Heading6 Something here\r\n" +
           "###### More text \r\n" +
          "###Heading 3 text";
Matcher m = Pattern.compile("(?m)^#{6}(?!#)(.*)").matcher(s);
String result = m.replaceAll("<b>$1</b>");
System.out.println(result);

See the Java demo

Result:

<b> Heading6 Something here</b>
<b> More text </b>
###Heading 3 text

Details:

  • (?m) - now, ^ matches start of a line
  • ^ - start of a line
  • #{6}(?!#) - exactly 6 # symbols
  • (.*) - Group 1: 0+ chars other than a line break up to the line end.

Thus, your regex definitions will look like

h1 = Pattern.compile("(?m)^#(?!#)(.*)");
h2 = Pattern.compile("(?m)^#{2}(?!#)(.*)");
h3 = Pattern.compile("(?m)^#{3}(?!#)(.*)");
h4 = Pattern.compile("(?m)^#{4}(?!#)(.*)");
h5 = Pattern.compile("(?m)^#{5}(?!#)(.*)");
h6 = Pattern.compile("(?m)^#{6}(?!#)(.*)");
like image 150
Wiktor Stribiżew Avatar answered Oct 08 '22 13:10

Wiktor Stribiżew


You can try this:

^(#{1,6}\s*[\S]+)

As you have mentioned that heading comes only at the start of a line thus you don't need look behind.

UPDATE: If you want to bold the full line that starts with heading then you can try this:

^(#{1,6}.*)

And replace by:

<b>$1</b>

Regex Demo

Sample Java source:

final String regex = "^(#{1,6}\\s*[\\S]+)";
final String string = "#heading 1 \n"
     + "bla bla bla\n"
     + "### heading 3 djdjdj\n"
     + "bla bla bla\n"
     + "## heading 2 bal;kasddfas\n"
     + "fbla bla bla";
final String subst = "<b>$1</b>";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll(subst);
System.out.println(result);

Run java source

like image 5
Rizwan M.Tuman Avatar answered Oct 08 '22 12:10

Rizwan M.Tuman