Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When parsing Javascript, what determines the meaning of a slash?

Javascript has a tricky grammar to parse. Forward-slashes can mean a number of different things: division operator, regular expression literal, comment introducer, or line-comment introducer. The last two are easy to distinguish: if the slash is followed by a star, it starts a multiline comment. If the slash is followed by another slash, it is a line-comment.

But the rules for disambiguating division and regex literal are escaping me. I can't find it in the ECMAScript standard. There the lexical grammar is explicitly divided into two parts, InputElementDiv and InputElementRegExp, depending on what a slash will mean. But there's nothing explaining when to use which.

And of course the dreaded semicolon insertion rules complicate everything.

Does anyone have an example of clear code for lexing Javascript that has the answer?

like image 252
Ned Batchelder Avatar asked Apr 01 '11 22:04

Ned Batchelder


People also ask

What is slash in JavaScript?

A slash symbol '/' is not a special character, but in JavaScript, it is used to open and close the RegEx: /... pattern.../ , so we should escape it too.

How do I enable slash in RegEx?

You can escape a / character, by using \/ .


1 Answers

It's actually fairly easy, but it requires making your lexer a little smarter than usual.

The division operator must follow an expression, and a regular expression literal can't follow an expression, so in all other cases you can safely assume you're looking at a regular expression literal.

You already have to identify Punctuators as multiple-character strings, if you're doing it right. So look at the previous token, and see if it's any of these:

. ( , { } [ ; , < > <= >= == != === !== + - * % ++ -- << >> >>> & | ^ ! ~ && || ? : = += -= *= %= <<= >>= >>>= &= |= ^= / /= 

For most of these, you now know you're in a context where you can find a regular expression literal. Now, in the case of ++ --, you'll need to do some extra work. If the ++ or -- is a pre-increment/decrement, then the / following it starts a regular expression literal; if it is a post-increment/decrement, then the / following it starts a DivPunctuator.

Fortunately, you can determine whether it is a "pre-" operator by checking its previous token. First, post-increment/decrement is a restricted production, so if ++ or -- is preceded by a linebreak, then you know it is "pre-". Otherwise, if the previous token is any of the things that can precede a regular expression literal (yay recursion!), then you know it is "pre-". In all other cases, it is "post-".

Of course, the ) punctuator doesn't always indicate the end of an expression - for example if (something) /regex/.exec(x). This is tricky because it does require some semantic understanding to disentangle.

Sadly, that's not quite all. There are some operators that are not Punctuators, and other notable keywords to boot. Regular expression literals can also follow these. They are:

new delete void typeof instanceof in do return case throw else 

If the IdentifierName you just consumed is one of these, then you're looking at a regular expression literal; otherwise, it's a DivPunctuator.

The above is based on the ECMAScript 5.1 specification (as found here) and does not include any browser-specific extensions to the language. But if you need to support those, then this should provide easy guidelines for determining which sort of context you're in.

Of course, most of the above represent very silly cases for including a regular expression literal. For example, you can't actually pre-increment a regular expression, even though it is syntactically allowed. So most tools can get away with simplifying the regular expression context checking for real-world applications. JSLint's method of checking the preceding character for (,=:[!&|?{}; is probably sufficient. But if you take such a shortcut when developing what's supposed to be a tool for lexing JS, then you should make sure to note that.

like image 168
Tamzin Blake Avatar answered Oct 05 '22 14:10

Tamzin Blake