I'm trying to bootstrap (a subset of) C from scratch, without using extra dependencies (parser generators, libraries, etc.). Also I want to make use of the idea of parser combinators, which is a fantastic technique in functional programming. I would like to borrow this idea from the functional world to procedural C, in a concise and practical way.
I tried to implement some necessary parser combinators for the following toy grammar, which is also an example from the book, Implementing Functional Languages - a tutorial, of Simon Peyton Jones.
greeting -> hg person "!"
hg -> "hello"
| "goodbye"
where person
is any token beginning with a letter. For example, the token list
["goodbye", "James", "!"]
is parsed into
[(("goodbye", "James"), ["!"])]
(The book uses Haskell, and it's hard to make it language-agnostic, but you get the idea :-)
I implemented this in C, and you can view the code here: https://gist.github.com/4451478
This implementation costs 200+ lines of C code, which is far more than the ~20 lines of Haskell as written in the book. So I'm not sure whether I'm on the right track of doing parser combinators in C, and if there's any possible improvements. Any suggestions are welcomed. Thanks in advance.
I'm looking into the subject myself and I'm following the work of Daniel Holden, author of mpc , a very well written parser combinator library for C, which allows, among other things, to embed EBNF and Regex inside C code:
mpc_parser_t *Expr = mpc_new("expression");
mpc_parser_t *Prod = mpc_new("product");
mpc_parser_t *Value = mpc_new("value");
mpc_parser_t *Maths = mpc_new("maths");
mpca_lang(MPCA_LANG_PREDICTIVE,
" expression : <product> (('+' | '-') <product>)*; "
" product : <value> (('*' | '/') <value>)*; "
" value : /[0-9]+/ | '(' <expression> ')'; "
" maths : /^/ <expression> /$/; "
Expr, Prod, Value, Maths, NULL);
Daniel Holden, also, has written an online book where he demonstrated how it's easy to write a new language using his library. The book is entitled "Build your own Lisp". I think you will find this really useful for your project. Last but not least, in the examples of the library, there is a ready-made program which generate a parser for a subset of C. ;-)
Try Cesium3 which is an implementation of parser combinators for C. (LLVM.)
Implementing parser combinator in C is a topic that also interests me, and recently, I wrote a parser combinator in C: https://github.com/petercommand/cparc
The following is a test case from my code that tries to parse comma separated numbers into a list of numbers. I use a list of parsers (and generate the parser from the 'list of parser' by calling parser_chain in the code) to mimic the 'do notation' in Haskell, though not as elegant.
parser_dp_return test_parser7_rest_dp(dynamic_parser_closure* ctx, input_t input) {
parser_dp_return dp_ret;
dp_ret.obj = ctx->objs[1]->obj;
dp_ret.status = PARSER_NORMAL;
dp_ret.i = input;
dp_ret.discard_obj_callback = NULL;
return dp_ret;
}
parser_dp_return test_parser7_full_dp(dynamic_parser_closure* ctx, input_t input) {
parser_dp_return dp_ret;
list* result = list_new();
list_push_back(result, ctx->objs[0]->obj);//num
if(ctx->objs[1] && ctx->objs[1]->obj) {
list_append(result, ctx->objs[1]->obj);
}
dp_ret.status = PARSER_NORMAL;
dp_ret.i = input;
dp_ret.discard_obj_callback = (void (*)(void *))&list_delete;
dp_ret.obj = result;
return dp_ret;
}
bool test_parser7() {//comma separated values
parser* number = num();
parser* comma = symbol(',');
parser* rest_parser = parser_chain_final(test_parser7_rest_dp);
list* parser_chain_list = list_new();
list_push_back(parser_chain_list, comma);//ctx 0
list_push_back(parser_chain_list, number);//ctx 1
list_push_back(parser_chain_list, rest_parser);
parser* rest = parser_chain(parser_chain_list);
list_delete(parser_chain_list);
parser* many_rest = many(rest);
list* parser_chain_full = list_new();
list_push_back(parser_chain_full, number);//ctx 0
list_push_back(parser_chain_full, many_rest);//ctx 1
parser* full_parser = parser_chain_final(test_parser7_full_dp);
list_push_back(parser_chain_full, full_parser);
parser* final = parser_chain(parser_chain_full);
const char* input = "1,20,300,4000,50000";
input_t i;
input_init(&i, input);
parser_dp_return dp_ret = parse(final, i);
parser_delete(number);
parser_delete(comma);
parser_delete(rest_parser);
parser_delete(rest);
parser_delete(many_rest);
parser_delete(full_parser);
parser_delete(final);
bool result = true;
test_true(&result, dp_ret.status == PARSER_NORMAL);
list* l = dp_ret.obj;
list_item* li = l->head;
test_true(&result, ptr_to_int(li->item) == 1);
li = li->next;
test_true(&result, ptr_to_int(li->item) == 20);
li = li->next;
test_true(&result, ptr_to_int(li->item) == 300);
li = li->next;
test_true(&result, ptr_to_int(li->item) == 4000);
li = li->next;
test_true(&result, ptr_to_int(li->item) == 50000);
return result;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With