Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any tools can randomly generate the source code according to a language grammar?

A C program source code can be parsed according to the C grammar(described in CFG) and eventually turned into many ASTs. I am considering if such tool exists: it can do the reverse thing by firstly randomly generating many ASTs, which include tokens that don't have the concrete string values, just the types of the tokens, according to the CFG, then generating the concrete tokens according to the tokens' definitions in the regular expression.

I can imagine the first step looks like an iterative non-terminals replacement, which is randomly and can be limited by certain number of iteration times. The second step is just generating randomly strings according to regular expressions.

Is there any tool that can do this?

like image 824
W.Sun Avatar asked Dec 17 '10 05:12

W.Sun


2 Answers

The "Data Generation Language" DGL does this, with the added ability to weight the probabilities of productions in the grammar being output.

In general, a recursive descent parser can be quite directly rewritten into a set of recursive procedures to generate, instead of parse / recognise, the language.

like image 167
grrussel Avatar answered Oct 09 '22 18:10

grrussel


Given a context-free grammar of a language, it is possible to generate a random string that matches the grammar.

For example, the nearley parser generator includes an implementation of an "unparser" that can generate strings from a grammar.

The same task can be accomplished using definite clause grammars in Prolog. An example of a sentence generator using definite clause grammars is given here.

like image 31
Anderson Green Avatar answered Oct 09 '22 17:10

Anderson Green