Purely as a self-learning exercise, I'm trying to write a Java parser in Perl using the Parse::RecDescent
module. I may later re-implement the parser using other other tools like Antlr, bison, etc.
But how would I ensure that my parser is indeed generating the correct parse, per the Java Language Specification? Meaning, its correct handling of dangling else
's, operator-associativity and -precedence etc.
One method would be to compare my parser against a known, bug-free parser by having both parsers generate ASTs for a very large number of test Java programs, and then comparing the two sets of ASTs.
If this is indeed the only method, where could I find a large suite of test Java programs thoroughly covering the entire Java Language Specification?
I have looked at JavaParser but it doesn't seem to have an exhaustive test dataset.
The other method would, of course, be writing by hand tens of thousands test Java programs myself, which would be very impractical for me, not only time-wise but also in ensuring its exhaustiveness!
Right click on the Test project, then select New > JUnit Test Case. Name the Test case DogTest, then click on O.K. Eclipse would generate some boilerplate code for us. To run your test, right click anywhere on the code editor and select Run As > JUnit Test. If you did everything correctly, your test should run!
A parser is a Java class that extracts attributes from a local file and stores the information in the repository. More specifically, in the case of a document, a parser: Takes in an InputStream or Reader object. Processes the character input, extracting attributes as it goes.
What is parse in Java? There are many Java classes that have the parse() method. Usually the parse() method receives some string as input, "extracts" the necessary information from it and converts it into an object of the calling class.
To decide if you have the right answer, you ideally have to compare to some kind of standard. This is hard for a computer languages.
Comparing ASTs is going to be hard, because there are no standards for such. Each parser that builds ASTs, builds an AST whose structure is designed by the person that coded the parser.
That means if you build an AST-producing parser, and you get somebody else's AST-producing parser, you'll discover that the AST nodes you have chosen don't match the other AST. Now you'll have to build a mapping from your AST to the other one (and how will you know the mapping is valid?). You can try to make your parser generate the AST from another parser, but what you will discover is the AST you produce is influenced by the parsing technology you use.
We have a similar problem with the Java front end my company produces (see bio if you want to know more). What we settle for is testing that the answer is self-consistent and then we do a lot of long-term experiential testing on big pieces of code.
Our solution is to:
It is tough to get this right; you have to get close and keep the testing pressure on continuously, especially since Java the language keeps moving. (We're at Java 8, and Java 9 is being threatened). Bottom line: it is a lot of work to build such a parser and check its sanity.
We'd love to have an independent set of tests, but we haven't seen one in the wild. And I would expect those tests if they exist (I assume Oracle and IBM have them) really don't test parsing and name resolution directly, but rather test that some bit of code compiles and runs producing a known result. Since we aren't building a compiler, we wouldn't be able to run such tests if we had them. We would be able to do the name resolution and type consistency checks and that would be helpful.
[We actually do this for a number of language front ends. You think Java is hard, try this with C++]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With