Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is JavaScript a Context Free Language?

This article on how browsers work explains how CSS is context free, while HTML is not. But what about JavaScript, is JavaScript context free?

I am learning about CFG and formal proofs, but am a long way away from understanding how to figure this out. Does anyone know if JavaScript is context free or not?

like image 929
Lance Avatar asked Jun 07 '15 18:06

Lance


People also ask

Which programming language is context free?

Since regular languages are a subset of context-free languages, all programs of bounded size are context-free.

Is HTML context-free?

For HTML, the answer about its context-freedom is yes. SGML is a well defined Context Free Language, and HTML defined on top of it is also a CFL. Parsers and grammars for both languages abound on the Web.

Is Python a context-free language?

Python is not a context free language.

What does it mean for language to be context-free?

Definition of context-free : of, relating to, or being a grammar or language based on rules that describe a change in a string without reference to elements not in the string also : being such a rule.


2 Answers

No, JavaScript is not a context-free language.

It is very close to one, and the ECMAScript 5 specification does indeed use a context-free grammar1 to describe the language's syntax (you can find all productions in Annex A).

Of course, it does make some extensions to pure context-free grammatical productions, and describes extra behaviour of the parser. One particular thing is the usage of lookahead which still makes a context-free languages, but would complicate the grammar a lot if it couldn't be used for some rules. Not allowing certain things to appear in strict mode code is similar - it could be done by adjusting the grammar (with far more productions), but the rule is much easier expressed by leaving the BNF.

However, there are also some2 rules that do make the language not context-free. You'll find an overview in the description of early errors, which can make a program code invalid. That object literals must not contain duplicate property names and that function parameter lists must not contain duplicate identifiers are two rules that cannot be expressed using (finite) context-free grammars.
My gut tells me that the automatic semicolon insertion falls in the same box, but I think its rules are too complicated to even attempt a proof here.

1: Actually it uses two grammars, a lexical and a syntactical one, where the first disambiguates between division expressions and regular expressions, and does produce the tokens that are the input to the second grammar.
2: Rather few actually, compared to other programming languages

like image 169
Bergi Avatar answered Sep 22 '22 09:09

Bergi


No programming language is (completely) context-free (i would say including CSS). Even though context-free grammars (CFGs) may be used to define/generate compilers/parsers for the language.

The simple fact (for example) that variables need to be defined first, before used, or that declarations involving identifiers should be unique, makes the language "context-sensitive".

A grammar for a (programming) language is supposed to describe (and generate) strings which are only the valid programs in that language (syntacticaly, but also semanticaly). Yet a CFG can describe and generate strings which are not valid programs (given the language semantics and specification). Conditions which describe valid programs (like for example: 1. a class needs to be defined before using new class(), 2. ids must match etc..) require context-sensitivity.

No CFG (with any finite number of productions) can correctly represent only the valid strings of this language: { anbncn : n >= 1 }, where n should be the same for a, b, c (it should match). Note one can indeed define a CFG for (a superset of) this language, but it will accept also non-valid strings along with valid ones (and then by other means filter them out), this is not what a grammar specification for a language is supposed to do. It should accept only the valid strings and reject the non-valid. In an analogy with statistics, one could say that a grammar specification for a language should eliminate/minimise both Type-I (reject valid strings) and Type-II (accept non-valid strings) errors, not just one of them.

Let me give a simple example in the context of JavaScript (since variables may seem as posing no problem for JavaScript).

In JavaScript (in strict mode), duplicate named function declaration is not valid. So this is not valid:

function duplicateFunc(){}
function duplicateFunc(){} // duplicate named function declaration

So the program is not correct, yet a CFG cannot handle this type of condition.

Even turning on strict mode itself is context-sensitive a subset of strict mode rules can be handled by spliting the CFG in cases and parsing accordingly as per @Bergi's answer (strict mode examples removed)

[UPDATE]

i will try to give a couple of examples of JavaScript non-context-free code which does not require "strict mode" (open to suggestions/corrections).

The use of reserved words/keywords is an extension (or limitation) on the grammar. It is an extraneous feature, so the following examples should count as examples of non-CF behaviour.

var var; // identifier using reserved name
var function; // identifier using reserved name
obj.var; // reserved name used as (explicit) property
obj["var"]; // this is fine!!
Object++; // built-in type used as numeric variable

[/UPDATE]

So the context plays a part in the correct parsing of the program. As it is said "context is everything"!

However this context-sensitivity can be handled (hopefuly) by only slight extensions to context-free grammars (like for example Attribute Grammars, Affix Grammars, TAG Grammars and so on), which still make for efficient parsing (meaning in polynomial time).

[UPDATE]

"i would say including CSS"

To elaborate a little on this statement. CSS1 would be CF, but as CSS specification adds more features inclufing variable support (e.g css-counters) it makes the CSS code context-sensitive in the sense described above (e.g variables need to be defined before used). so the following css code would be parsed by the browser (and ignored as it is not valid) but it cannot be described by a CFG

body { }
h3::before {
  counter-increment: section;               /* no counter section has been defined, not valid css code */
  content: "Section" counter(section) ": "; /* Display the counter */
}

[/UPDATE]

like image 30
Nikos M. Avatar answered Sep 22 '22 09:09

Nikos M.