I'm writing a source-to-source transformation using parsec, So I have a <code>LanguageDef</code> for my language and I build a <code>TokenParser</code> for it using <code>Text.Parsec.Token.makeTokenParser</code>: <pre class="prettyprint"><code>myLanguage = LanguageDef { ... commentStart = "/*" , commentEnd = "*/" ... } -- defines 'stringLiteral', 'identifier', etc... TokenParser {..} = makeTokenParser myLanguage </code></pre> Unfortunately since I defined <code>commentStart</code> and <code>commentEnd</code>, each of the parser combinators in the <code>TokenParser</code> is a lexeme parser implemented in terms of <code>whiteSpace</code>, and <code>whiteSpace</code> eats spaces as well as comments. What is the right way to preserve comments in this situation? Approaches I can think of: <ol> <li>Don't define <code>commentStart</code> and <code>commentEnd</code>. Wrap each of the lexeme parsers in another combinator that grabs comments before parsing each token.</li> <li>Implement my own version of <code>makeTokenParser</code> (or perhaps use some library that generalizes <code>Text.Parsec.Token</code>; if so, which library?)</li> </ol> What's the done thing in this situation?

In principle, defining commentStart and commentEnd don't fit with preserving comments, because you need to consider comments as valid parts of both source and target language, including them in your grammar and your AST/ADT. In this way, you'd be able to keep the text of the comment as the payload data of a Comment constructor, and output it appropriately in the target language, something like <pre class="prettyprint"><code>data Statement = Comment String | Return Expression | ...... </code></pre> The fact that neither source nor target language sees the comment text as relevant is irrelevant for your translation code. <hr> Major problem with this approach: It doesn't really fit well with <code>makeTokenParser</code>, and fits better with implementing your source language's parser from the ground up. I guess I'm veering towards editing <code>makeTokenParser</code> to just get the comment parsers to return the <code>String</code> instead of <code>()</code>.

Preserving comments in `Text.Parsec.Token` tokenizers

Tags:

comments

parsing

haskell

code-translation

parsec

I'm writing a source-to-source transformation using parsec, So I have a LanguageDef for my language and I build a TokenParser for it using Text.Parsec.Token.makeTokenParser:

myLanguage = LanguageDef { ...
  commentStart = "/*"
  , commentEnd = "*/"
  ...
}

-- defines 'stringLiteral', 'identifier', etc...
TokenParser {..} = makeTokenParser myLanguage

Unfortunately since I defined commentStart and commentEnd, each of the parser combinators in the TokenParser is a lexeme parser implemented in terms of whiteSpace, and whiteSpace eats spaces as well as comments.

What is the right way to preserve comments in this situation?

Approaches I can think of:

Don't define commentStart and commentEnd. Wrap each of the lexeme parsers in another combinator that grabs comments before parsing each token.
Implement my own version of makeTokenParser (or perhaps use some library that generalizes Text.Parsec.Token; if so, which library?)

What's the done thing in this situation?

1000

asked Jun 26 '14 13:06

Lambdageek

1 Answers

In principle, defining commentStart and commentEnd don't fit with preserving comments, because you need to consider comments as valid parts of both source and target language, including them in your grammar and your AST/ADT.

In this way, you'd be able to keep the text of the comment as the payload data of a Comment constructor, and output it appropriately in the target language, something like

data Statement = Comment String | Return Expression | ......

The fact that neither source nor target language sees the comment text as relevant is irrelevant for your translation code.

Major problem with this approach: It doesn't really fit well with makeTokenParser, and fits better with implementing your source language's parser from the ground up.

I guess I'm veering towards editing makeTokenParser to just get the comment parsers to return the String instead of ().

149

answered Sep 20 '22 17:09

AndrewC

Related questions
                            
                                Handling Antlr Syntax Errors or how to give a better message on unexpected token
                            
                                Parsing ftp url that username/password/path has special characters like @, /
                            
                                Parsing queries in Oracle SQL Developer
                            
                                Good Examples: English Parsing / Natural Language Processing
                            
                                How does the CYK algorithm work?
                            
                                In CUP: How to make something optional to parse?
                            
                                How to write a language with Python-like indentation in syntax?
                            
                                Parsing errors with Bison
                            
                                Safely remove all html code from a string in python
                            
                                How do I reduce my parse tree into an abstract syntax tree?
                            
                                How parse quasi-html text in java?
                            
                                Syntax analysis and semantic analysis
                            
                                RegEx: capture entire group content
                            
                                lemon parser parsing 0 token
                            
                                Parsing Time, Date/Time, or Date
                            
                                R data.table text parsing
                            
                                Conflict resolution in LALR(1) parser
                            
                                How to generate random strings that match a given regexp?
                            
                                using the TSqlParser
                            
                                How to fix IncompatibleClassChangeError during Android Jackson Parsing using annotations in Android Lollipop?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With