Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Testing with multiple regexps at the same time (for use in syntactic analysis)

I am writing a simple syntax highlighter in JavaScript, and I need to find a way to test with multiple regular expressions at the same time.

The idea is to find out which comes first, so I can determine the new set of expressions to look for.

The expressions could be something like:

/<%@/, /<%--/, /<!--/ and /<[a-z:-]/

First I tried a strategy where I combined the expressions in groups like:

/(<%@)|(<%--)|(<!--)|(<[a-z:-])/

That way I could find out which matched group was not undefined. But the problem is, when some of the subexpressions contain groups or backrefferences.

So my question is this:

Does anyone know a good and reasonable way the look for matches with multiple regular expressions in a string?

like image 615
Michael Andersen Avatar asked Dec 08 '25 10:12

Michael Andersen


1 Answers

Is there any particular reason why you can't tokenize the input and then test the beginning of each token to see what type it is for the purposes of highlighting? I think you're overthinking this one. A simple cascade of if-elseifs will cover this just fine:

if (token.startsWith("<%@")) {
  // paint it red
}
else if (token.startsWith("<%--")) {
  // paint it green
}
else if (token.startsWith("<!--")) {
  // paint it blue
}
else if (token.matches("^<[a-z:-]")) {
  // paint it black
}

The above is pseudocode and needs to be magically translated into JavaScript. I leave this as an exercise for the reader.

like image 131
Welbog Avatar answered Dec 09 '25 23:12

Welbog