Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is the Python grammar used internally?

I'm trying to get a deeper understanding of how Python works, and I've been looking at the grammar shown at http://docs.python.org/3.3/reference/grammar.html.

I notice it says you would have to change parsermodule.c also, but truthfully I'm just not following what's going on here.

I understand that a grammar is a specification for how to read the language, but...I can't even tell what this is written in. It looks almost like Python but then it isn't.

I'm looking to get a better understanding of this specification and how it is used internally by Python to....do things. What depends on it (the answer is everything, but I mean specifically which aspect of the "engine" is processing it), what uses it, how does it tie in to compiling/running a script?

It's hard to believe that the whole language comes down to a two page specification...

like image 687
temporary_user_name Avatar asked Oct 13 '13 22:10

temporary_user_name


1 Answers

A grammar is used to describe all possible strings in a language. It is also useful in specifying how a parser should parse the language.

In this grammar it seems like they are using their own version of EBNF, where a non-terminal is any lowercase word and a terminal is all uppercase or surrounded by quotes. For example, NEWLINE is a terminal, arith_expr is a non-terminal and 'if' is also a terminal. Any non-terminal can be replaced by anything to the right of the colon of it's respective production rule. For example, if you look at the first rule:

single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE

We can replace single_input with one of either a NEWLINE, a simple_stmt or a compound_stmt followed by a NEWLINE. Suppose we replaced it with "compound_stmt NEWLINE", then we would look for the production rule for compound_stmt:

compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated

and choose which of these we want to use, and substitute it for "compound_stmt" (Keeping NEWLINE in it's place)

Suppose we wanted to generate the valid python program:

if 5 < 2 + 3 or not 1 == 5:
    raise

We could use the following derivation:

  1. single_input
  2. compound_stmt NEWLINE
  3. if_stmt NEWLINE
  4. 'if' test ':' suite NEWLINE
  5. 'if' or_test ':' NEWLINE INDENT stmt stmt DEDENT NEWLINE
  6. 'if' and_test 'or' and_test ':' NEWLINE INDENT simple_stmt DEDENT NEWLINE
  7. 'if' not_test 'or' not_test ':' NEWLINE INDENT small_stmt DEDENT NEWLINE
  8. 'if' comparison 'or' 'not' not_test ':' NEWLINE INDENT flow_stmt DEDENT NEWLINE
  9. 'if' expr comp_op expr 'or' 'not' comparison ':' NEWLINE INDENT raise_stmt DEDENT NEWLINE
  10. 'if' arith_expr '<' arith_expr 'or' 'not' arith_expr comp_op arith_expr ':' NEWLINE INDENT 'raise' DEDENT NEWLINE
  11. 'if' term '<' term '+' term 'or' 'not' arith_expr == arith_expr ':' NEWLINE INDENT 'raise' DEDENT NEWLINE
  12. 'if' NUMBER '<' NUMBER '+' NUMBER 'or' 'not' NUMBER == NUMBER ':' NEWLINE INDENT 'raise' DEDENT NEWLINE

A couple of notes here, firstly, we must start with one of the non-terminals which is listed as a starting non-terminal. In that page, they list them as single_input, file_input, or eval_input. Secondly, a derivation is finished once all the symbols are terminal (hence the name). Thirdly, it is more common to do one substitution per line, for the sake of brevity I did all possible substitutions at once and started skipping steps near the end.

Given a string in the language, how do we find it's derivation? This is the job of a parser. A parser reverse-engineers a production sequence to first check that it is indeed a valid string, and furthermore how it can be derived from the grammar. It's worth noting that many grammars can describe a single language. However, for a given string, it's derivation will of course be different for each grammar. So technically we write a parser for a grammar not a language. Some grammars are easier to parse, some grammars are easier to read/understand. This one belongs in the former.

Also this doesn't specify the entire language, just what it looks like. A grammar says nothing about semantics.

If you're interested in more about parsing and grammar I recommend Grune, Jacobs - Parsing Techniques. It's free and good for self-study.

like image 126
user2097752 Avatar answered Oct 08 '22 02:10

user2097752