Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to construct a CFG based on a given regular expression

I am trying to figure out how to construct a CFG (context free grammar) based on a given regular expression. For example, a(ab)*(a|b) I think there is an algorithm to go through, but it is really confusing. here is what i got so far:

    S->aAB; 
    A->aAb|empty;
    B->a|b;

Does this look right? Any help would be appreciated.

like image 771
user3599855 Avatar asked May 03 '14 19:05

user3599855


2 Answers

Construct the CFG in three parts, each for a, (ab)* and (a|b).

For (a|b), you've got B -> a | b right.

(ab)* would mean strings like ab, abab, ababab and so on. So A -> abA | empty would be the correct production.

Hence, the full grammar becomes:

S -> aAB
A -> abA | empty
B -> a | b

Note: A -> aAb | empty would derive strings like ab, aabb, aaabbb and so on, which is not a regular language, and can't possibly represent a regular expression.

like image 197
John Bupit Avatar answered Sep 22 '22 06:09

John Bupit


Another way to construct a context-free grammar for a given regular expression is:

  1. Construct a finite state machine which accepts the same language as the regular expression.
  2. Create a grammar whose terminals are those in the alphabet of the regular expression, whose non-terminals are (or correspond 1:1 to) the states in the state machine, and which has a rule of the form X -> t Y for every state-machine transition from state X to state Y on terminal symbol t. If your CFG notation allows it, each final state F gets a rule of the form F -> epsilon. If your CFG notation doesn't allow such rules, then for each transition from state X to final state F on terminal t, produce the rule X -> t (in addition to the rule X -> t F already described). The result is a regular grammar, a context-free grammar that obeys the additional constraint that each right-hand side has at most one non-terminal.

For the example given, assume we construct the following FSA (of the many that accept the same language as the regular expression):

FSA for language <code>a(ab)*(a|b)</code>

From this, it is straightforward to derive the following regular grammar:

S -> a A1
A1 -> a A2
A2 -> b B3
B3 -> a A2
B3 -> a A4
B3 -> b B5
A1 -> a A4
A1 -> b B5
A4 -> epsilon
B5 -> epsilon
epsilon -> 

Or, if we don't want rules with an empty right-hand side, drop the last three rules of that grammar and add:

A1 -> a
A1 -> b
B3 -> a
B3 -> b

Compared with other approaches, this method has the disadvantage that the resulting grammar is more verbose than it needs to be, and the advantage that the derivation can be entirely mechanical, which means it's easier to get right without having to think hard.

like image 26
C. M. Sperberg-McQueen Avatar answered Sep 20 '22 06:09

C. M. Sperberg-McQueen