Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better way to test (automatically) a parser?

I’m recently writing a small programming language, and have finished writing its parser. I want to write an automated test for the parser (that its result is an abstract syntax tree), but I’m not sure which way is better.

First what I tried is just to serialize AST to S-expression text and compare it to the expected output text I wrote by hand, but it has some problems:

  • There are trivial meaningless differences between a serialized text and the expected output like whitespaces. For example, there is no difference between:

    (attribute (symbol str) (symbol length))
    

    (that is serialized) and:

    (attribute (symbol str)
               (symbol length))
    

    (that is handwritten by me) in their meanings, but string comparison distincts them of course. Okay, I could resolve it by normalization.

  • When a test fails, it doesn’t show the difference between actual tree and expected tree concisely. I want to show only a difference node, not whole tree.

Second what I tried is to write S-expression parser and compare AST that parser (to be tested) generates to AST that S-expression parser (that I just implemented) generates from the handwritten expected output. However I realized that S-expression have to be tested also and it could be really nonsense.

I wonder what is the typical and easy way to test the parser.

PS. I am using Java, and dont’t want any dependencies to third-party libraries.

like image 455
minhee Avatar asked Jan 22 '11 16:01

minhee


People also ask

Should a unit test read from file?

Strictly speaking, unit tests should not use the file system because it is slow. However, readability is more important. XML in a file is easier to read, and can be loaded in an XML friendly editor.

What is text parser?

So, what is text parsing? In simple terms, it is a common programming task that separates the given series of text into smaller components based on some rules. Its application ranges from document parsing to deep learning NLP.

How does parse work?

A parser is a compiler or interpreter component that breaks data into smaller elements for easy translation into another language. A parser takes input in the form of a sequence of tokens, interactive commands, or program instructions and breaks them up into parts that can be used by other components in programming.


1 Answers

Providing you are looking for a completely automated and extensible unit testing framework for your parser I'd recommend the following approach:

Incorrect input

Create a set of samples of incorrect inputs. Then feed the parse with each of them making sure the parser rejects them. I's a good idea to provide metadata for each test case that defines the expected output — the specific error code / message the parser is supposed to produce.

Correct input

As in the previous case, create a set of samples representing various correct inputs. Besides the simple validation that the parser accepts all inputs, there's still the problem of validating that the actual Abstract Syntax Tree makes sense.

To address this problem I'd do the following: Describing the expected AST for each test case in some well-known format that can be safely parsed — deserialized into the actual in-memory AST structures — by a 3rd party parser considered bug-free (for your case). The natural choice is XML since most languages / programming frameworks cover XML support and provide the respective (de)serialization facilities. The best solution would be to deserialize right into the AST node types. Since convenient visual editing tools for XML exist it's feasible to construct even large test cases.

Then I'd construct an AST comparer using the visitor pattern which pair-up the two ASTs and compare both nodes in each pair for equality. However, equality is a per-AST-node-type specific operation.


Notes:

  • This approach would work with most unit-testing frameworks like JUnit.
  • AST to XML serialization is a welcome tool for debugging the compiler.
  • The visitor pattern implementation can easily serve as the backbone for multiple processing stages within the compiler.
  • There are compiler test suites freely available that can provide some inspiration to your project — see for example the Ada Conformity Assesment Test Suite for the Ada programming language, although this test suite deals with higher-level testing, not just parser testing.
like image 67
Ondrej Tucny Avatar answered Nov 26 '22 12:11

Ondrej Tucny