Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I generate sentences from a formal grammar?

What's a common way of generating sentences from a grammar?

I want an algorithm that's sort of the opposite of a parser. That is, given a formal context-free grammar (say LL), I want to generate an arbitrary sentence that conforms to that grammar. I use sentence here to mean any valid body of text, so it can actually be a whole program (even if it doesn't make any sense—as long as it's syntactially correct).

Example grammar:

program   : <imports> NEWLINE? <namespace>
imports   : ("import" <identifier> NEWLINE)* 
namespace : "namespace " <identifier> NEWLINE "{" <classes> "}" 
identifier: (A-Za-z_) (A-Za-z0-9_)*
...

Example generated program:

import jkhbhhuob
import aaaaa888_

namespace u8nFGubgykb
{ class ui0op_np { ... }
}
like image 441
Mark Cidade Avatar asked Mar 02 '09 19:03

Mark Cidade


People also ask

How do you make a sentence?

A sentence follows Subject + Verb + Object word order. He (subject) obtained (verb) his degree (object).

What is the meaning of formal grammar?

A formal grammar is defined as a set of production rules for such strings in a formal language. Formal language theory, the discipline that studies formal grammars and languages, is a branch of applied mathematics.


2 Answers

Here is a Python example using the NLTK:

from nltk import parse_cfg, ChartParser
from random import choice

def produce(grammar, symbol):
    words = []
    productions = grammar.productions(lhs = symbol)
    production = choice(productions)
    for sym in production.rhs():
        if isinstance(sym, str):
            words.append(sym)
        else:
            words.extend(produce(grammar, sym))
    return words

grammar = parse_cfg('''
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I'
VP -> V NP | VP PP
V -> 'shot' | 'killed' | 'wounded'
Det -> 'an' | 'my' 
N -> 'elephant' | 'pajamas' | 'cat' | 'dog'
P -> 'in' | 'outside'
''')

parser = ChartParser(grammar)

gr = parser.grammar()
print ' '.join(produce(gr, gr.start()))

The example is adapted from the book. The sentences generated are syntactically correct but still total gibberish.

like image 148
Björn Lindqvist Avatar answered Sep 25 '22 18:09

Björn Lindqvist


I don't know that there's a "common" algorithm for doing this. Random program generation is used in genetic programming so you could look for a grammar based GP system and see how they handle program generation. I would do a recursive rule generation algorithm like the pseudo-code:

void GenerateRule(someRule)
{
  foreach (part in someRule.Parts)
  {
    if (part.IsLiteral) OutputLiteral(part);
    if (part.IsIdentifier) Output(GenerateIdentifier(part)));
    if (part.IsRule) GenerateRule(part.Rule);
  }
}

This assumes that you've read in all of the parts into some data structure. You'd also need to handle the repetitions(randomly generate the number of times they occur) and optional rules (flip a coin to see if they are there or not).


Edit: Oh, and if the rule has more than one option, you'd just pick one of the options to go with, and process it the same way. So if some rule was (Literal|Variable), you'd randomly pick between the two.

like image 30
Beardo Avatar answered Sep 22 '22 18:09

Beardo