Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regexp to match simple markdown

Tags:

java

regex

I'm trying to figure out the regexp to match all the occurences of *this kind of strings*. Two additional rules unfortunately made this thing more complicated than I thought:

  1. tagged string should start with * followed by non-whitespace character (so * this one* should not be matched
  2. tagged string should end with non-whitespace followed by * followed by whitespace (so *this one * and *this o*ne should not be matched

I started with simplest regexp \*\S([^\*]+)?\* which for my testing string:

*foo 1 * 2 bar* foo *b* azz *qu **ux*

matches places in square brackets:

[*foo 1 *] 2 bar* foo [*b*] azz [*qu *][*ux*]

and this is what I'd like to achieve:

[*foo 1 * 2 bar*] foo [*b*] azz [*qu **ux*]

so 2 problems appear:

  • how to express in a regexp a rule from 2. "search till first non-whitespace followed * followed by whitespace appears"? positive lookahead?
  • how to match whitespace from rule 2. but not include it into result, which \*\S([^\*]+)?\*\s would do?
like image 503
Michal Avatar asked Oct 15 '22 14:10

Michal


1 Answers

If you want to start matching from the rightmost *, you may use

\*(?=[^\s*]).*?(?<=[^\s*])\*(?!\S)

To start a match from a left-most * (as in ``), remove the * from the first lookaround (or replace its pattern with \S):

\*(?=\S).*?(?<=[^\s*])\*(?!\S)

See the regex demo #1 and regex demo #2. Add (?s) at the start or compile with Pattern.DOTALL to match texts across lines.

Details

  • \* - a * char
  • (?=[^\s*]) - the next char must be a non-whitespace and not a *
  • .*? - any 0+ chars as few as possible
  • (?<=[^\s*]) - the preceding char should be a non-whitespace and not a *
  • \* - a * char
  • (?!\S) - a whitespace boundary pattern, the next char can be a whitespace, or end of string can be at this location in the string.

In Java:

String regex = "\\*(?=[^\\s*]).*?(?<=[^\\s*])\\*(?!\\S)";
like image 82
Wiktor Stribiżew Avatar answered Oct 18 '22 14:10

Wiktor Stribiżew