How to say that (in BNF, EBNF, etc) any two or more letters are placed in the same vertical alignment e.g In python 2.x, we have what we call <code>indentation</code>. <pre class="prettyprint"><code>def hello(): print "hello," print "world" hello() </code></pre> Note letter <code>p</code> (second line) is placed in the same vertical alignment of letter <code>p</code> (third line) Further example (in markdown): <pre class="prettyprint"><code>MyHeader ======== topic ----- </code></pre> Note <code>M</code> and the first <code>=</code> are placed in the same vertical alignment (also <code>r</code> and last <code>=</code>, t and first <code>-</code>, <code>c</code> and last <code>-</code>) My question is How to represent these vertical alignment of letters using BNF, EBNF or etc.? Further note: My point of this question is searching for a representation method to represent a vertical alignment of code, not just want to know how to write BNF or EBNF of <code>Python</code> or <code>Markdown</code>.

You can parse an indentation-sensitive language (like Python or Haskell) by using a little hack, which is well-described in the Python language reference's chapter on lexical analysis. As described, the lexical analyzer turns leading whitespace into <code>INDENT</code> and <code>DEDENT</code> tokens [Note 1], which are then used in the Python grammar in a straightforward fashion. Here's a small excerpt: <pre class="prettyprint"><code>suite ::= stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT statement ::= stmt_list NEWLINE | compound_stmt stmt_list ::= simple_stmt (";" simple_stmt)* [";"] while_stmt ::= "while" expression ":" suite ["else" ":" suite] </code></pre> So if you are prepared to describe (or reference) the lexical analysis algorithm, the BNF is simple. However, you cannot actually write that algorithm as a context free grammar, because it is not context-free. (I'll leave out the proof, but it's similar to the proof that <code>anbncn</code> is not context free, which you can find in most elementary formal language textbooks, and all over the internet.) ISO standard EBNF (a free PDF is available) provides a way of including "extensions which a user may require": a <code>Special-sequence</code>, which is any text not containing a <kbd>?</kbd> surrounded on both sides by a <kbd>?</kbd>. So you could abuse the notation by including [Note 2]: <pre class="prettyprint"><code>DEDENT = ? See section 2.1.8 of https://docs.python.org/3.3/reference/ ? ; </code></pre> Or you could insert a full description of the algorithm. Of course, neither of those techniques will allow a parser generator to produce an accurate lexical analyzer, but it would be a reasonable way of communicating intent to a human reader. It's worth noting that EBNF itself uses a special sequence to define one of its productions: <pre class="prettyprint"><code>(* see 4.7 *) syntactic exception = ? a syntactic-factor that could be replaced by a syntactic-factor containing no meta-identifiers ? ; </code></pre> <hr> <h3>Notes</h3> <ol> <li> The lexical analyzer also converts some physical newline characters into <code>NEWLINE</code> tokens, while making other newline characters vanish. </li> <li> EBNF normally uses the syntax <code>=</code> rather than <code>::=</code> for a production, and insists that they be terminated with <code>;</code>. Comments are enclosed between <code>(*</code> and <code>*)</code>. </li> </ol>

How to represent vertical alignment of syntax of code using BNF, EBNF or etc.?

Tags:

syntax

indentation

bnf

vertical-text

ebnf

How to say that (in BNF, EBNF, etc) any two or more letters are placed in the same vertical alignment

e.g In python 2.x, we have what we call indentation.

def hello():
    print "hello," 
    print "world"

hello()

Note letter p (second line) is placed in the same vertical alignment of letter p (third line)

Further example (in markdown):

MyHeader
========
topic
-----

Note M and the first = are placed in the same vertical alignment (also r and last =, t and first -, c and last -)

My question is How to represent these vertical alignment of letters using BNF, EBNF or etc.?

Further note: My point of this question is searching for a representation method to represent a vertical alignment of code, not just want to know how to write BNF or EBNF of Python or Markdown.

547

asked Jan 05 '15 19:01

fronthem

1 Answers

You can parse an indentation-sensitive language (like Python or Haskell) by using a little hack, which is well-described in the Python language reference's chapter on lexical analysis. As described, the lexical analyzer turns leading whitespace into INDENT and DEDENT tokens [Note 1], which are then used in the Python grammar in a straightforward fashion. Here's a small excerpt:

suite         ::=  stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT
statement     ::=  stmt_list NEWLINE | compound_stmt
stmt_list     ::=  simple_stmt (";" simple_stmt)* [";"]
while_stmt    ::=  "while" expression ":" suite ["else" ":" suite]

So if you are prepared to describe (or reference) the lexical analysis algorithm, the BNF is simple.

However, you cannot actually write that algorithm as a context free grammar, because it is not context-free. (I'll leave out the proof, but it's similar to the proof that aⁿbⁿcⁿ is not context free, which you can find in most elementary formal language textbooks, and all over the internet.)

ISO standard EBNF (a free PDF is available) provides a way of including "extensions which a user may require": a Special-sequence, which is any text not containing a ? surrounded on both sides by a ?. So you could abuse the notation by including [Note 2]:

DEDENT = ? See section 2.1.8 of https://docs.python.org/3.3/reference/ ? ;

Or you could insert a full description of the algorithm. Of course, neither of those techniques will allow a parser generator to produce an accurate lexical analyzer, but it would be a reasonable way of communicating intent to a human reader.

It's worth noting that EBNF itself uses a special sequence to define one of its productions:

(* see 4.7 *) syntactic exception
   = ? a syntactic-factor that could be replaced
       by a syntactic-factor containing no
       meta-identifiers
     ? ;

Notes

The lexical analyzer also converts some physical newline characters into NEWLINE tokens, while making other newline characters vanish.
EBNF normally uses the syntax = rather than ::= for a production, and insists that they be terminated with ;. Comments are enclosed between (* and *).

answered Nov 05 '22 03:11

rici

Related questions
                            
                                Understanding a Complicated Type Signature
                            
                                What does "->" mean? [duplicate]
                            
                                Class extending itself?
                            
                                Generics - Legal alternative for (elements instanceof List<? extends Comparable>)
                            
                                Any reason ever for a Ruby method to return splat-list?
                            
                                What does question mark equals mean in CoffeeScript?
                            
                                Parenthesis after variable name C++
                            
                                Overriding superclass property with different type
                            
                                What does the []-esque decorator syntax in Python mean?
                            
                                Syntax hinting in Vim
                            
                                C++ syntax for dereferencing class member variables
                            
                                What is the right syntax of IF statement in MySQL?
                            
                                Haskell Data.List.Class and syntax
                            
                                syntax error on import mysql
                            
                                How to create class member objects initialized during the constructor
                            
                                What is this Haskell Syntax (type level operators?)
                            
                                Shopify Liquid Syntax - What is the difference between {%- assign [some_var] = [some_val] -%} and {% assign [some_var] = [some_val] %}
                            
                                What does ${VARIABLE+x} mean in bash? [duplicate]
                            
                                VIM: Certain .c files opening without syntax highlighting
                            
                                Swig: Syntax error in input(3)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With