How to say that (in BNF, EBNF, etc) any two or more letters are placed in the same vertical alignment
e.g In python 2.x, we have what we call indentation
.
def hello():
print "hello,"
print "world"
hello()
Note letter p
(second line) is placed in the same vertical alignment of letter p
(third line)
Further example (in markdown):
MyHeader
========
topic
-----
Note M
and the first =
are placed in the same vertical alignment (also r
and last =
, t and first -
, c
and last -
)
My question is How to represent these vertical alignment of letters using BNF, EBNF or etc.?
Further note:
My point of this question is searching for a representation method to represent a vertical alignment of code, not just want to know how to write BNF or EBNF of Python
or Markdown
.
BNF syntax can only represent a rule in one line, whereas in EBNF a terminating character, the semicolon, marks the end of a rule. Furthermore, EBNF includes mechanisms for enhancements, defining the number of repetitions, excluding alternatives, comments, etc.
An EBNF description is an unordered list of EBNF rules. Each EBNF rule EBNF descriptions comprises a list of EBNF rules of the form: LHS ⇐ RHS has three parts: a left–hand side (LHS), a right-hand side (RHS), and the ⇐ character separating these two sides; read this symbol as “is defined as”.
Grouping. To indicate precedence, EBNF grammars may use parentheses, () , to explictly define the order of expansion. For example, the rule: <expr> ::= <term> ("+" | "-") <expr> defines an expression form that allows both addition and subtraction.
You can parse an indentation-sensitive language (like Python or Haskell) by using a little hack, which is well-described in the Python language reference's chapter on lexical analysis. As described, the lexical analyzer turns leading whitespace into INDENT
and DEDENT
tokens [Note 1], which are then used in the Python grammar in a straightforward fashion. Here's a small excerpt:
suite ::= stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT
statement ::= stmt_list NEWLINE | compound_stmt
stmt_list ::= simple_stmt (";" simple_stmt)* [";"]
while_stmt ::= "while" expression ":" suite ["else" ":" suite]
So if you are prepared to describe (or reference) the lexical analysis algorithm, the BNF is simple.
However, you cannot actually write that algorithm as a context free grammar, because it is not context-free. (I'll leave out the proof, but it's similar to the proof that anbncn
is not context free, which you can find in most elementary formal language textbooks, and all over the internet.)
ISO standard EBNF (a free PDF is available) provides a way of including "extensions which a user may require": a Special-sequence
, which is any text not containing a ? surrounded on both sides by a ?. So you could abuse the notation by including [Note 2]:
DEDENT = ? See section 2.1.8 of https://docs.python.org/3.3/reference/ ? ;
Or you could insert a full description of the algorithm. Of course, neither of those techniques will allow a parser generator to produce an accurate lexical analyzer, but it would be a reasonable way of communicating intent to a human reader.
It's worth noting that EBNF itself uses a special sequence to define one of its productions:
(* see 4.7 *) syntactic exception
= ? a syntactic-factor that could be replaced
by a syntactic-factor containing no
meta-identifiers
? ;
The lexical analyzer also converts some physical newline characters into NEWLINE
tokens, while making other newline characters vanish.
EBNF normally uses the syntax =
rather than ::=
for a production, and insists that they be terminated with ;
. Comments are enclosed between (*
and *)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With