What's a common way of generating sentences from a grammar?
I want an algorithm that's sort of the opposite of a parser. That is, given a formal context-free grammar (say LL), I want to generate an arbitrary sentence that conforms to that grammar. I use sentence here to mean any valid body of text, so it can actually be a whole program (even if it doesn't make any sense—as long as it's syntactially correct).
Example grammar:
program : <imports> NEWLINE? <namespace>
imports : ("import" <identifier> NEWLINE)*
namespace : "namespace " <identifier> NEWLINE "{" <classes> "}"
identifier: (A-Za-z_) (A-Za-z0-9_)*
...
Example generated program:
import jkhbhhuob
import aaaaa888_
namespace u8nFGubgykb
{ class ui0op_np { ... }
}
A sentence follows Subject + Verb + Object word order. He (subject) obtained (verb) his degree (object).
A formal grammar is defined as a set of production rules for such strings in a formal language. Formal language theory, the discipline that studies formal grammars and languages, is a branch of applied mathematics.
Here is a Python example using the NLTK:
from nltk import parse_cfg, ChartParser
from random import choice
def produce(grammar, symbol):
words = []
productions = grammar.productions(lhs = symbol)
production = choice(productions)
for sym in production.rhs():
if isinstance(sym, str):
words.append(sym)
else:
words.extend(produce(grammar, sym))
return words
grammar = parse_cfg('''
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I'
VP -> V NP | VP PP
V -> 'shot' | 'killed' | 'wounded'
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas' | 'cat' | 'dog'
P -> 'in' | 'outside'
''')
parser = ChartParser(grammar)
gr = parser.grammar()
print ' '.join(produce(gr, gr.start()))
The example is adapted from the book. The sentences generated are syntactically correct but still total gibberish.
I don't know that there's a "common" algorithm for doing this. Random program generation is used in genetic programming so you could look for a grammar based GP system and see how they handle program generation. I would do a recursive rule generation algorithm like the pseudo-code:
void GenerateRule(someRule)
{
foreach (part in someRule.Parts)
{
if (part.IsLiteral) OutputLiteral(part);
if (part.IsIdentifier) Output(GenerateIdentifier(part)));
if (part.IsRule) GenerateRule(part.Rule);
}
}
This assumes that you've read in all of the parts into some data structure. You'd also need to handle the repetitions(randomly generate the number of times they occur) and optional rules (flip a coin to see if they are there or not).
Edit: Oh, and if the rule has more than one option, you'd just pick one of the options to go with, and process it the same way. So if some rule was (Literal|Variable), you'd randomly pick between the two.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With