Relation extraction via chunking using NLTK

Tags:

I am trying to figure out how to use NLTK's cascading chunker as per Chapter 7 of the NLTK book. Unfortunately, I'm running into a few issues when performing non-trivial chunking measures.

Let's start with this phrase:

"adventure movies between 2000 and 2015 featuring performances by daniel craig"

I am able to find all the relevant NPs when I use the following grammar:

grammar = "NP: {<DT>?<JJ>*<NN.*>+}"

However, I am not sure how to build nested structures with NLTK. The book gives the following format, but there are clearly a few things missing (e.g. How does one actually specify multiple rules?):

grammar = r"""
  NP: {<DT|JJ|NN.*>+}          # Chunk sequences of DT, JJ, NN
  PP: {<IN><NP>}               # Chunk prepositions followed by NP
  VP: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments
  CLAUSE: {<NP><VP>}           # Chunk NP, VP
  """

In my case, I'd like to do something like the following:

grammar = r"""
          MEDIA: {<DT>?<JJ>*<NN.*>+}
          RELATION: {<V.*>}{<DT>?<JJ>*<NN.*>+}
          ENTITY: {<NN.*>}
          """

Assuming that I'd like to use a cascaded chunker for my task, what syntax would I need to use? Additionally, is it possible for me to specify specific words (e.g. "directed" or "acted") when using a chunker?

227

asked May 16 '15 00:05

grill

1 Answers

I can't comment on the relationship extraction part, not least because you don't give any details on what you want to do and what kind of data you have. So this is a rather partial answer.

a.) How does cascading chunking work in NLTK b.) Is it possible to treat the chunker like a context-free grammar, and if so, how?

As I understand section "Building nested structure with cascaded chunkers" in the NLTK book, you can use it with a context free grammar but you have to apply it repeatedly to get the recursive structure. Chunkers are flat, but you can add chunks on top of chunks.

c.) How can I use chunking to perform relation extraction?

I can't really speak to that, and anyway as I said you don't give any specifics; but if you're dealing with real text, my understanding is is that hand-written rulesets for any task are useless unless you have a large team and a lot of time. Look into the probabilistic tools that come with the NLTK. It'll be a whole lot easier if you have an annotated training corpus.

Anyway, a couple more comments about the RegexpParser.

You'll find a lot more use examples on http://www.nltk.org/howto/chunk.html. (Unfortunately it's not a real how-to, but a test suite.)
According to this, you can specify multiple expansion rules like this:
```
patterns = """NP: {<DT|PP\$>?<JJ>*<NN>}
    {<NNP>+}
    {<NN>+}
"""
```
I should add that grammars can have multiple rules with the same left side. That should add some flexibility with grouping related rules, etc.

165

answered Oct 18 '22 20:10

alexis

Related questions
                            
                                Matlab importdata() function equivalent in Python
                            
                                Adding effects to make voice sound like it’s over a telephone
                            
                                in Python use of hierarchy for findContours
                            
                                Django: using same test database in a separate thread
                            
                                Numpy.dot bug? Inconsistent NaN behavior
                            
                                Python mysql.connector timeout
                            
                                Deleting multiple slices from a numpy array
                            
                                Pandas as fast data storage for Flask application
                            
                                Compute the pairwise distance in scipy with missing values
                            
                                Python seaborn facetGrid: Is it possible to set row category label location to the left
                            
                                Python multiprocessing process number
                            
                                Displaying an HTML table from a Django view in another template
                            
                                Why is float64 cast to int when multiplying with a list?
                            
                                How can I use python's cProfile to profile a django app while running on gunicorn
                            
                                How to get the equivalent of pg_dump -s <dbname> in Python (psycopg2)?
                            
                                ImportError: No module named 'nose'
                            
                                Define context variables in behave python
                            
                                How to add to a list type in Python Eve without replacing old values
                            
                                How to iterate this tree/graph
                            
                                How to setup numpy in jython

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Relation extraction via chunking using NLTK

Tags:

python

nltk

named-entity-recognition

chunking

grill

People also ask

1 Answers

alexis

Recent Activity

Donate For Us