OK, so here's a question: Given that Haskell allows you to define new operators with arbitrary operator precedence... how is it possible to actually parse Haskell source code? You cannot know what operator precedences are set until you parse the source. But you cannot parse the source until you know the correct operator precedences. So... um, how? Consider, for example, the expression <pre class="prettyprint"><code>x *** y +++ z </code></pre> Until we finish parsing the module, we don't know what other modules are imported, and hence what operators (and other identifiers) might be in scope. We certainly don't know their precedences yet. But the parser has to return something... But should it return <pre class="prettyprint"><code>(x *** y) +++ z </code></pre> Or should it return <pre class="prettyprint"><code>x *** (y +++ z) </code></pre> The poor parser has no way to know. This can only be determined once you hunt down the import that brings <code>(+++)</code> and <code>(***)</code> into scope, load that file off disk, and discover what the operator precedences are. Clearly the parser itself isn't going to do all that I/O; a parser just turns a stream of characters into an AST. Clearly somebody somewhere has figured out how to do this. But I can't work it out... Any hints?

Summarising the comments so far, it seems the possibilities are thus: <ul> <li>Return a parse tree where any infix operators are left as some kind of "list" structure, and then rearrange once precedences become known.</li> <li>Pretend you know the operator precedences, and then rearrange the parse tree after the fact.</li> <li>Do a first parse that only reads imports and fixity declarations, load the imports, and then do a full parse with known precedences.</li> </ul>

Parsing with user-defined operator precedence

Tags:

syntax

parsing

haskell

OK, so here's a question: Given that Haskell allows you to define new operators with arbitrary operator precedence... how is it possible to actually parse Haskell source code?

You cannot know what operator precedences are set until you parse the source. But you cannot parse the source until you know the correct operator precedences. So... um, how?

Consider, for example, the expression

x *** y +++ z

Until we finish parsing the module, we don't know what other modules are imported, and hence what operators (and other identifiers) might be in scope. We certainly don't know their precedences yet. But the parser has to return something... But should it return

(x *** y) +++ z

Or should it return

x *** (y +++ z)

The poor parser has no way to know. This can only be determined once you hunt down the import that brings (+++) and (***) into scope, load that file off disk, and discover what the operator precedences are. Clearly the parser itself isn't going to do all that I/O; a parser just turns a stream of characters into an AST.

Clearly somebody somewhere has figured out how to do this. But I can't work it out... Any hints?

270

asked Mar 21 '15 17:03

MathematicalOrchid

3 Answers

Quoting the page on GHC trac for the parser:

Infix operators are parsed as if they were all left-associative. The renamer uses the fixity declarations to re-associate the syntax tree.

103

answered Oct 18 '22 20:10

András Kovács

András Kovács's answer tells what's really done in GHC, but there's some history to this.

There was actually a somewhat hypothetical change from the Haskell 98 to the Haskell 2010 standard. In the former's BNF grammar, operator fixity and parsing were intertwined in such a way that you could in theory have some very strange interactions between the rules for fixity and the rules for when expressions and indentation blocks end. (For the latter two, the rules are essentially, "keep on going until you have to stop".)

In particular you could redefine a local operator and its fixity such that a use of it belonged in the redefining inner where block exactly ... when it didn't. So you got a parser paradox. I cannot find any of the old examples but this may be one:

let (+) = (Prelude.+)
    infix 9 + -- make the inner + high precedence and non-associative
in 2 + 3 + 4
--       ^ this + cannot parse here as the inner operator, which means
--         the let ... in ... expression should end automatically first,
--         but then it's the standard +, and its fixity says it should parse
--         as part of the inner expression...

In Haskell 2010 they officially changed that so that operator fixities are determined in a separate stage after the parsing proper.

So why was this a hypothetical change? Because all the compiler writers already did it the Haskell 2010 way, and always had, for their own sanity.

answered Oct 18 '22 18:10

Ørjan Johansen

Summarising the comments so far, it seems the possibilities are thus:

Return a parse tree where any infix operators are left as some kind of "list" structure, and then rearrange once precedences become known.
Pretend you know the operator precedences, and then rearrange the parse tree after the fact.
Do a first parse that only reads imports and fixity declarations, load the imports, and then do a full parse with known precedences.

answered Oct 18 '22 20:10

MathematicalOrchid

Related questions
                            
                                Python list() vs list comprehension building speed
                            
                                How to keep globally current user until logout with angular_devise?
                            
                                Paging with PagedList, is it efficient?
                            
                                Hide screen in 'Recent Apps List', but allow screenshots
                            
                                How to reference "this" within anonymous listeners when using short notation?
                            
                                How to configure the Http Cache when using Volley with OkHttp?
                            
                                Laravel Collections counting result
                            
                                Type cast custom types to base types
                            
                                Why does variable initialization of to an assignment expression [String x = (x = y)] compile?
                            
                                How can a Unix program display output on screen even when stdout and stderr are redirected?
                            
                                JPA orphan removal does not work for OneToOne relations
                            
                                Why is a unique_ptr not freed after a constructor calls an exception?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parsing with user-defined operator precedence

Tags:

syntax

parsing

haskell

MathematicalOrchid

People also ask

3 Answers

András Kovács

Ørjan Johansen

MathematicalOrchid

Recent Activity

Donate For Us