How to use NLTK to generate sentences from an induced grammar?

Tags:

I have a (large) list of parsed sentences (which were parsed using the Stanford parser), for example, the sentence "Now you can be entertained" has the following tree:

(ROOT
  (S
    (ADVP (RB Now))
    (, ,)
    (NP (PRP you))
    (VP (MD can)
      (VP (VB be)
        (VP (VBN entertained))))
    (. .)))

I am using the set of sentence trees to induce a grammar using nltk:

import nltk

# ... for each sentence tree t, add its production to allProductions
allProductions += t.productions()

# Induce the grammar
S = nltk.Nonterminal('S')
grammar = nltk.induce_pcfg(S, allProductions)

Now I would like to use grammar to generate new, random sentences. My hope is that since the grammar was learned from a specific set of input examples, then the generated sentences will be semantically similar. Can I do this in nltk?

If I can't use nltk to do this, do any other tools exist that can take the (possibly reformatted) grammar and generate sentences?

453

asked Feb 21 '13 18:02

stepthom

2 Answers

In NLTK 2.0 you can use nltk.parse.generate to generate all possible sentences for a given grammar.

This code defines a function which should generate a single sentence based on the production rules in a (P)CFG.

# This example uses choice to choose from possible expansions
from random import choice
# This function is based on _generate_all() in nltk.parse.generate
# It therefore assumes the same import environment otherwise.
def generate_sample(grammar, items=["S"]):
    frags = []
    if len(items) == 1:
        if isinstance(items[0], Nonterminal):
            for prod in grammar.productions(lhs=items[0]):
                frags.append(generate_sample(grammar, prod.rhs()))
        else:
            frags.append(items[0])
    else:
        # This is where we need to make our changes
        chosen_expansion = choice(items)
        frags.append(generate_sample,chosen_expansion)
    return frags

To make use of the weights in your PCFG, you'll obviously want to use a better sampling method than choice(), which implicitly assumes all expansions of the current node are equiprobable.

answered Oct 04 '22 17:10

dmh

First of all, if you generate random sentences, they may be semantically correct, but they will probably lose their sense.

(It sounds to me a bit like those MIT students did with their SCIgen program which is auto-generating scientific paper. Very interesting btw.)

Anyway, I never did it myself, but it seems possible with nltk.bigrams, you may way to have a look there under Generating Random Text with Bigrams.

You can also generate all subtrees of a current tree, I'm not sure if it is what you want either.

answered Oct 04 '22 17:10

ForceMagic

Related questions
                            
                                Python OpenCv cv2 equivalent for CV_FILLED
                            
                                Beautiful soup getting the first child
                            
                                Understanding Polytypes in Hindley-Milner Type Inference
                            
                                Angularjs issue $http.get not working
                            
                                Which method of caching is the fastest/lightest for Node/Mongo/NginX?
                            
                                Paramiko -- using encrypted private key file on OS X
                            
                                What makes the difference of cell-based NSTableView and view-based NSTableView?
                            
                                Javascript throwing : Refused to execute inline event handler because it violates the following Content Security Policy directive: "script-src 'self'
                            
                                IntelliJ IDEA Plugin Development: Save groups of tabs, save them persistently and reload a set of tabs if requested by the user
                            
                                How to change elements in sparse matrix in Python's SciPy?
                            
                                Plotting bar charts on map using ggplot2?
                            
                                Whats the difference between do while and while in VB.NET?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With