Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better way for parser combinators in C?

I'm trying to bootstrap (a subset of) C from scratch, without using extra dependencies (parser generators, libraries, etc.). Also I want to make use of the idea of parser combinators, which is a fantastic technique in functional programming. I would like to borrow this idea from the functional world to procedural C, in a concise and practical way.

I tried to implement some necessary parser combinators for the following toy grammar, which is also an example from the book, Implementing Functional Languages - a tutorial, of Simon Peyton Jones.

greeting -> hg person "!"
hg       -> "hello"
          | "goodbye"

where person is any token beginning with a letter. For example, the token list

["goodbye", "James", "!"]

is parsed into

[(("goodbye", "James"), ["!"])]

(The book uses Haskell, and it's hard to make it language-agnostic, but you get the idea :-)

I implemented this in C, and you can view the code here: https://gist.github.com/4451478

This implementation costs 200+ lines of C code, which is far more than the ~20 lines of Haskell as written in the book. So I'm not sure whether I'm on the right track of doing parser combinators in C, and if there's any possible improvements. Any suggestions are welcomed. Thanks in advance.

like image 674
Xiao Jia Avatar asked Jan 04 '13 10:01

Xiao Jia


3 Answers

I'm looking into the subject myself and I'm following the work of Daniel Holden, author of mpc , a very well written parser combinator library for C, which allows, among other things, to embed EBNF and Regex inside C code:

  mpc_parser_t *Expr  = mpc_new("expression");
  mpc_parser_t *Prod  = mpc_new("product");
  mpc_parser_t *Value = mpc_new("value");
  mpc_parser_t *Maths = mpc_new("maths");

  mpca_lang(MPCA_LANG_PREDICTIVE,
    " expression : <product> (('+' | '-') <product>)*; "
    " product : <value>   (('*' | '/')   <value>)*;    "
    " value : /[0-9]+/ | '(' <expression> ')';         "
    " maths : /^/ <expression> /$/;                    "
    Expr, Prod, Value, Maths, NULL);

Daniel Holden, also, has written an online book where he demonstrated how it's easy to write a new language using his library. The book is entitled "Build your own Lisp". I think you will find this really useful for your project. Last but not least, in the examples of the library, there is a ready-made program which generate a parser for a subset of C. ;-)

like image 127
Antonio Molinaro Avatar answered Nov 15 '22 00:11

Antonio Molinaro


Try Cesium3 which is an implementation of parser combinators for C. (LLVM.)

like image 45
Prof. Falken Avatar answered Nov 15 '22 01:11

Prof. Falken


Implementing parser combinator in C is a topic that also interests me, and recently, I wrote a parser combinator in C: https://github.com/petercommand/cparc

The following is a test case from my code that tries to parse comma separated numbers into a list of numbers. I use a list of parsers (and generate the parser from the 'list of parser' by calling parser_chain in the code) to mimic the 'do notation' in Haskell, though not as elegant.

parser_dp_return test_parser7_rest_dp(dynamic_parser_closure* ctx, input_t input) {
  parser_dp_return dp_ret;
  dp_ret.obj = ctx->objs[1]->obj;
  dp_ret.status = PARSER_NORMAL;
  dp_ret.i = input;
  dp_ret.discard_obj_callback = NULL;
  return dp_ret;
}

parser_dp_return test_parser7_full_dp(dynamic_parser_closure* ctx, input_t input) {
  parser_dp_return dp_ret;
  list* result = list_new();
  list_push_back(result, ctx->objs[0]->obj);//num
  if(ctx->objs[1] && ctx->objs[1]->obj) {
    list_append(result, ctx->objs[1]->obj);
  }
  dp_ret.status = PARSER_NORMAL;
  dp_ret.i = input;
  dp_ret.discard_obj_callback = (void (*)(void *))&list_delete;

  dp_ret.obj = result;
  return dp_ret;
}

bool test_parser7() {//comma separated values
  parser* number = num();
  parser* comma = symbol(',');
  parser* rest_parser = parser_chain_final(test_parser7_rest_dp);
  list* parser_chain_list = list_new();
  list_push_back(parser_chain_list, comma);//ctx 0
  list_push_back(parser_chain_list, number);//ctx 1
  list_push_back(parser_chain_list, rest_parser);

  parser* rest = parser_chain(parser_chain_list);
  list_delete(parser_chain_list);
  parser* many_rest = many(rest);

  list* parser_chain_full = list_new();
  list_push_back(parser_chain_full, number);//ctx 0
  list_push_back(parser_chain_full, many_rest);//ctx 1
  parser* full_parser = parser_chain_final(test_parser7_full_dp);
  list_push_back(parser_chain_full, full_parser);
  parser* final = parser_chain(parser_chain_full);

  const char* input = "1,20,300,4000,50000";
  input_t i;
  input_init(&i, input);
  parser_dp_return dp_ret = parse(final, i);
  parser_delete(number);
  parser_delete(comma);
  parser_delete(rest_parser);
  parser_delete(rest);
  parser_delete(many_rest);
  parser_delete(full_parser);
  parser_delete(final);
  bool result = true;
  test_true(&result, dp_ret.status == PARSER_NORMAL);
  list* l = dp_ret.obj;
  list_item* li = l->head;
  test_true(&result, ptr_to_int(li->item) == 1);
  li = li->next;
  test_true(&result, ptr_to_int(li->item) == 20);
  li = li->next;
  test_true(&result, ptr_to_int(li->item) == 300);
  li = li->next;
  test_true(&result, ptr_to_int(li->item) == 4000);
  li = li->next;
  test_true(&result, ptr_to_int(li->item) == 50000);
  return result;
}
like image 37
Petercommand Hsu Avatar answered Nov 15 '22 02:11

Petercommand Hsu