So I am trying to write a regular expression for JavaScript that will allow me to replace ** with tags as a sort of self rolled Markdown to HTML converter.
e.g.
**bold** -> <strong>bold</strong>
but
\**not** -> **not** because * was escaped.
I have the following regular expression which seems to work well:
/(?<!\\)(?:\\\\)*(\*\*)([^\\\*]+)(\*\*)/g
However, JS does not support lookbehinds! I rewrote it using lookaheads:
/(\*\*)([^\\\*]+)*(\*\*)(?!\\)(?:\\\\)*/g
but this would require me to reverse the string which is undesirable because I need to support multibyte characters (see here). I am not completely opposed to using the library mentioned in that answer, but I would prefer a solution that does not require me to add one if possible.
Is there a way to rewrite my regular expression without using look behinds?
EDIT:
After thinking about this a little more, I'm even starting to question whether regular expressions is even the best way to approach this problem, but I will leave the question up out of interest.
One way to work around missing lookbehinds is to match undesired patterns first and then using alternation match the desired pattern. Then apply conditional replace, substituting the undesired patterns with themselves and the desired ones with what you actually want.
In your particular case this means match \* first and **<something>** only after that. Then use
input.replace(/\\\*|\*\*(.*?)\*\*/, function(m, p1) {
return m == '\\*' ? m : '<strong>' + p1 + '</strong>';
})
to do the conditional replace.
The real regex is more complex though. First, you need to secure from escaped backslash itself (i.e. \\**bold** should become \\<strong>bold</strong>). So you need to match \\ separately the same way as you do for \*.
Second, the expression between ** and ** may also contain some escaped asterisks and slashes. To cope with this you need to match \\ and \** explicitly and (using alternation) only after that anything else non-greedily. This may be represented as (?:\\\\|\\\*\*|\*(?!\*)|[\S\s])*?.
Therefore the final regex turns to
\\\\|\\\*|\*\*((?:\\\\|\\\*\*|\*(?!\*)|[\S\s])*?)\*\*
Demo: https://regex101.com/r/Da35r5/1
JavaScript replace demo:
function convert() {
var md = document.getElementById("md").value;
var re = /\\\\|\\\*|\*\*((?:\\\\|\\\*\*|\*(?!\*)|[\S\s])*?)\*\*/g;
var html = md.replace(re, function(match, p1) {
return match.startsWith('\\') ? match : '<strong>' + p1 + '</strong>';
});
document.getElementById("html").value = html;
}
<span style="display:inline-block">
MD
<textarea id="md" cols="20" rows="10" style="display:block">
**bold**
**foo * bar **
**foo \** bar**
**fo\\\\** bar** **
\**bold** **
\\**bold**
** multi
line**
</textarea>
</span>
<span style="display:inline-block">
HTML
<textarea id="html" cols="50" rows="10" style="display:block">
</textarea>
</span>
<button onclick="convert()" style="display:block">Convert</button>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With