Better way to test (automatically) a parser?

Tags:

I’m recently writing a small programming language, and have finished writing its parser. I want to write an automated test for the parser (that its result is an abstract syntax tree), but I’m not sure which way is better.

First what I tried is just to serialize AST to S-expression text and compare it to the expected output text I wrote by hand, but it has some problems:

There are trivial meaningless differences between a serialized text and the expected output like whitespaces. For example, there is no difference between:
```
(attribute (symbol str) (symbol length))
```
(that is serialized) and:
```
(attribute (symbol str)
           (symbol length))
```
(that is handwritten by me) in their meanings, but string comparison distincts them of course. Okay, I could resolve it by normalization.
When a test fails, it doesn’t show the difference between actual tree and expected tree concisely. I want to show only a difference node, not whole tree.

Second what I tried is to write S-expression parser and compare AST that parser (to be tested) generates to AST that S-expression parser (that I just implemented) generates from the handwritten expected output. However I realized that S-expression have to be tested also and it could be really nonsense.

I wonder what is the typical and easy way to test the parser.

PS. I am using Java, and dont’t want any dependencies to third-party libraries.

455

asked Jan 22 '11 16:01

minhee

1 Answers

Providing you are looking for a completely automated and extensible unit testing framework for your parser I'd recommend the following approach:

Incorrect input

Create a set of samples of incorrect inputs. Then feed the parse with each of them making sure the parser rejects them. I's a good idea to provide metadata for each test case that defines the expected output — the specific error code / message the parser is supposed to produce.

Correct input

As in the previous case, create a set of samples representing various correct inputs. Besides the simple validation that the parser accepts all inputs, there's still the problem of validating that the actual Abstract Syntax Tree makes sense.

To address this problem I'd do the following: Describing the expected AST for each test case in some well-known format that can be safely parsed — deserialized into the actual in-memory AST structures — by a 3rd party parser considered bug-free (for your case). The natural choice is XML since most languages / programming frameworks cover XML support and provide the respective (de)serialization facilities. The best solution would be to deserialize right into the AST node types. Since convenient visual editing tools for XML exist it's feasible to construct even large test cases.

Then I'd construct an AST comparer using the visitor pattern which pair-up the two ASTs and compare both nodes in each pair for equality. However, equality is a per-AST-node-type specific operation.

Notes:

This approach would work with most unit-testing frameworks like JUnit.
AST to XML serialization is a welcome tool for debugging the compiler.
The visitor pattern implementation can easily serve as the backbone for multiple processing stages within the compiler.
There are compiler test suites freely available that can provide some inspiration to your project — see for example the Ada Conformity Assesment Test Suite for the Ada programming language, although this test suite deals with higher-level testing, not just parser testing.

answered Nov 26 '22 12:11

Ondrej Tucny

Related questions
                            
                                Problems with reentrant Flex and Bison
                            
                                How to parse a time value of type UTCTime from string in Haskell?
                            
                                Is there an empty URI?
                            
                                how to parse kotlin code?
                            
                                How C++ compilers differentiate the token >> for binary operator, and for template
                            
                                Is there jQuery like selectors for Java XML parsing?
                            
                                Need a way to parse algebraic expressions in C
                            
                                How do I write a parser in C or Objective-C without a parser generator?
                            
                                PLY: quickly parsing long lists of items?
                            
                                How does jQuery treat comment elements?
                            
                                Process argc and argv outside of main()
                            
                                what is meant by left most derivation?
                            
                                lxml.html parsing with XPath and variables
                            
                                The fundamental reason why regex and HTML don't mix? The theory behind it?
                            
                                How is typecasting parsed by C compilers?
                            
                                How to parse LLVM IR line by line
                            
                                How can I debug my flex/bison grammar?
                            
                                Composable Grammars
                            
                                Online resources for writing a parser-generator
                            
                                What technology for large scale scraping/parsing? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Better way to test (automatically) a parser?

Tags:

parsing

testing

abstract-syntax-tree