Testing of a new Java parser

Tags:

Purely as a self-learning exercise, I'm trying to write a Java parser in Perl using the Parse::RecDescent module. I may later re-implement the parser using other other tools like Antlr, bison, etc.

But how would I ensure that my parser is indeed generating the correct parse, per the Java Language Specification? Meaning, its correct handling of dangling else's, operator-associativity and -precedence etc.

One method would be to compare my parser against a known, bug-free parser by having both parsers generate ASTs for a very large number of test Java programs, and then comparing the two sets of ASTs.

If this is indeed the only method, where could I find a large suite of test Java programs thoroughly covering the entire Java Language Specification?

I have looked at JavaParser but it doesn't seem to have an exhaustive test dataset.

The other method would, of course, be writing by hand tens of thousands test Java programs myself, which would be very impractical for me, not only time-wise but also in ensuring its exhaustiveness!

544

asked Sep 25 '16 13:09

Harry

1 Answers

To decide if you have the right answer, you ideally have to compare to some kind of standard. This is hard for a computer languages.

Comparing ASTs is going to be hard, because there are no standards for such. Each parser that builds ASTs, builds an AST whose structure is designed by the person that coded the parser.

That means if you build an AST-producing parser, and you get somebody else's AST-producing parser, you'll discover that the AST nodes you have chosen don't match the other AST. Now you'll have to build a mapping from your AST to the other one (and how will you know the mapping is valid?). You can try to make your parser generate the AST from another parser, but what you will discover is the AST you produce is influenced by the parsing technology you use.

We have a similar problem with the Java front end my company produces (see bio if you want to know more). What we settle for is testing that the answer is self-consistent and then we do a lot of long-term experiential testing on big pieces of code.

Our solution is to:

(Build a parser, using the strongest parsing technology we can get (GLR). This means we can recognize certain constructs not easily recognized by other parsing technologies (LL, LR, ...), and thus produce AST nodes that other parsers would have a hard time producing. See comments below for an example where this matters. Even so, we produce AST nodes in way that avoids completely our having to hand-code AST node construction as demanded by most other parsing technologies; that tends to produce somewhat different ASTs than hand-coded).
Parse a LOT of Java code (producing ASTs) to make sure we don't have parsing errors. [The JDK is a pretty good size example and it is easy to get]
Our tools can take an AST and regenerate (prettyprint) source code, complete with comments but perhaps a somewhat different layout. We verify that parsed-then-prettyprinted code also parses. We re-prettyprint the parsed prettyprinted version; this should be identical to the prettyprinted version since we always produce the same layout. This test is a good indication that our AST design and implementation doesn't lose anything about the source code
Build symbol tables, resolve the meaning of names, and verify that the legal Java programs type-check according to our front end. That doesn't tell you anything about the nature of the AST except that it is good enough (which in fact, is good enough!) Because the type checking task is very complex (go check your local Java standard), it is also pretty fragile. If you don't have everything right, the type checking will likely fail when applied across a large body of code. Again, the JDK is a pretty good test of this. Note: a Java parser without name and type resolution isn't very useful in practice
Produce JavaDoc like cross references that include hyperlinked source code from the above results. This means it is easy to manually check a bit code to see that name resolution (therefore AST construction) is sane.
Live with the results, applying the front end to various program analysis and transformtions of code. We find the occasional problem and fix it.

It is tough to get this right; you have to get close and keep the testing pressure on continuously, especially since Java the language keeps moving. (We're at Java 8, and Java 9 is being threatened). Bottom line: it is a lot of work to build such a parser and check its sanity.

We'd love to have an independent set of tests, but we haven't seen one in the wild. And I would expect those tests if they exist (I assume Oracle and IBM have them) really don't test parsing and name resolution directly, but rather test that some bit of code compiles and runs producing a known result. Since we aren't building a compiler, we wouldn't be able to run such tests if we had them. We would be able to do the name resolution and type consistency checks and that would be helpful.

[We actually do this for a number of language front ends. You think Java is hard, try this with C++]

answered Sep 20 '22 12:09

Ira Baxter

Related questions
                            
                                Error: Maven: Non parseable POM ... expected START_TAG or END_TAG not TEXT
                            
                                How not to store specific variable in database when using setValue(Object)?
                            
                                Remove back slash from Json String in android
                            
                                Java FlatMapFunction in Spark: error: is not abstract and does not override abstract method call(String) in FlatMapFunction
                            
                                could not find property 'sourceSets' on root project gradle
                            
                                Jackson ObjectMapper adjusts dates to context timezone when deserializing even if explicitly disabled
                            
                                PowerMockRunner does not apply JUnit ClassRules
                            
                                Spring JMSListener - How should it handle empty payloads?
                            
                                Reading Avro file gives AvroTypeException: missing required field error (even though the new field is declared null in schema)
                            
                                Java - return a reference to a specific place in a linkedlist with list iterator
                            
                                How should/could I store my JWT tokens in redis so that I can see current user sessions?
                            
                                Java get collection by generic type
                            
                                Thread contention on java.io.PrintStream
                            
                                How to create encoder for custom Java objects?
                            
                                How to do case insensitive sorting of Norwegian characters (Æ, Ø, and Å) using Hibernate Lucene Search?
                            
                                Reactor way to cancel Subscriptions
                            
                                hibernate not creating table but no error messages
                            
                                Jackson - generic getValue method
                            
                                Wait for multiple async calls to finish in RxJava
                            
                                How to stop a running Thread in Java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Testing of a new Java parser

Tags:

java

parsing

testing

javaparser

Harry

People also ask

1 Answers

Ira Baxter

Recent Activity

Donate For Us