Creating relations in sentence using chunk tags (not NER) with NLTK | NLP

Tags:

I am trying to create custom chunk tags and to extract relations from them. Following is the code that takes me to the cascaded chunk tree.

grammar = r"""
  NPH: {<DT|JJ|NN.*>+}          # Chunk sequences of DT, JJ, NN
  PPH: {<IN><NP>}               # Chunk prepositions followed by NP
  VPH: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments
  CLAUSE: {<NP><VP>}           # Chunk NP, VP
  """
cp = nltk.RegexpParser(grammar)
sentence = [("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"),
    ("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]


chunked = cp.parse(sentence)

Output -

(S (NPH Mary/NN) saw/VBD (NPH the/DT cat/NN) sit/VB on/IN (NPH the/DT mat/NN))

Now I am trying to extract relations between the NPH tag values with the text in between using the nltk.sem.extract_rels function, BUT it seems to work ONLY on named entities generated with the ne_chunk function.

IN = re.compile(r'.*\bon\b')
for rel in nltk.sem.extract_rels('NPH', 'NPH', chunked,corpus='ieer',pattern = IN):
        print(nltk.sem.rtuple(rel))

This gives the following error -

ValueError: your value for the subject type has not been recognized: NPH

Is there an easy way to use only chunk tags to create relations as I don't really want to retrain the NER model to detect my chunk tags as respective named entities

Thank you!

573

asked Jul 17 '18 21:07

Rohan

1 Answers

extract_rels (doc) checks that arguments subjclass and objclass are known NE tags, hence the error with NPH.

The easy, ad hoc, way is to rewrite a customized extract_rels function (example below).

import nltk
import re

grammar = r"""
  NPH: {<DT|JJ|NN.*>+}          # Chunk sequences of DT, JJ, NN
  PPH: {<IN><NP>}               # Chunk prepositions followed by NP
  VPH: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments
  CLAUSE: {<NP><VP>}           # Chunk NP, VP
  """
cp = nltk.RegexpParser(grammar)
sentence = [("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"),
    ("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]

chunked = cp.parse(sentence)

IN = re.compile(r'.*\bon\b')

def extract_rels(subjclass, objclass, chunked, pattern):

    # padding because this function checks right context
    pairs = nltk.sem.relextract.tree2semi_rel(chunked) + [[[]]] 

    reldicts = nltk.sem.relextract.semi_rel2reldict(pairs)

    relfilter = lambda x: (x['subjclass'] == subjclass and
                           pattern.match(x['filler']) and
                           x['objclass'] == objclass)


    return list(filter(relfilter, reldicts))

for e in extract_rels('NPH', 'NPH', chunked, pattern=IN):
    print(nltk.sem.rtuple(e))

Output:

[NPH: 'the/DT cat/NN'] 'sit/VB on/IN' [NPH: 'the/DT mat/NN']

answered Sep 30 '22 11:09

mcoav

Related questions
                            
                                Maximum Product of Three Numbers
                            
                                How to implement a comment feature that works with multiple selections in QScintilla?
                            
                                Getting features in RFECV scikit-learn
                            
                                Predict label of text with multi-layered perceptron model in Tensorflow
                            
                                How to create a conda environment shortcut on Windows
                            
                                Pandas equivalent of SQL non-equi JOIN
                            
                                Python representation for a set of non-overlapping integer ranges
                            
                                What is the fastest way to XOR A LOT of binary arrays in python?
                            
                                Is it possible to restore corrupted “interned” bytes-objects
                            
                                Changing font family in OpenCV Python using PIL
                            
                                Python - Pickle Spacy for PySpark
                            
                                Tensorflow object detection API killed - OOM. How to reduce shuffle buffer size?
                            
                                Understanding decision_function values
                            
                                asyncio - await coroutine more than once (periodic tasks)
                            
                                Change celery setting task_always_eager for a single unit test case
                            
                                Specific class for generated-members in pylint?
                            
                                Flask application cannot be exposed on droplet
                            
                                All possible subdivisions of a list
                            
                                How to download single file from a git repository using python
                            
                                How to set proxy for Pandas pd.read_csv

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating relations in sentence using chunk tags (not NER) with NLTK | NLP

Tags:

python

nlp

nltk

named-entity-recognition

chunking

Rohan

People also ask

1 Answers

mcoav

Recent Activity

Donate For Us