Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating relations in sentence using chunk tags (not NER) with NLTK | NLP

I am trying to create custom chunk tags and to extract relations from them. Following is the code that takes me to the cascaded chunk tree.

grammar = r"""
  NPH: {<DT|JJ|NN.*>+}          # Chunk sequences of DT, JJ, NN
  PPH: {<IN><NP>}               # Chunk prepositions followed by NP
  VPH: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments
  CLAUSE: {<NP><VP>}           # Chunk NP, VP
  """
cp = nltk.RegexpParser(grammar)
sentence = [("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"),
    ("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]


chunked = cp.parse(sentence)

Output -

(S (NPH Mary/NN) saw/VBD (NPH the/DT cat/NN) sit/VB on/IN (NPH the/DT mat/NN))

Now I am trying to extract relations between the NPH tag values with the text in between using the nltk.sem.extract_rels function, BUT it seems to work ONLY on named entities generated with the ne_chunk function.

IN = re.compile(r'.*\bon\b')
for rel in nltk.sem.extract_rels('NPH', 'NPH', chunked,corpus='ieer',pattern = IN):
        print(nltk.sem.rtuple(rel))

This gives the following error -

ValueError: your value for the subject type has not been recognized: NPH

Is there an easy way to use only chunk tags to create relations as I don't really want to retrain the NER model to detect my chunk tags as respective named entities

Thank you!

like image 573
Rohan Avatar asked Jul 17 '18 21:07

Rohan


People also ask

What is syntactic chunking?

This syntactic chunking effect indicates that the participants created a representation of the syntactic structure of whole numbers, and this representation allowed them to create increasingly longer chunks for increasingly longer grammatical segments.

What problem does IOB tagging solve in chunking?

An O tag indicates that a token belongs to no chunk. The B- prefix before a tag indicates that the tag is the beginning of a chunk that immediately follows another chunk without O tags between them. It is used only in that case: when a chunk comes after an O tag, the first token of the chunk takes the I- prefix.

What is chunking in NLTK?

chunk package. Classes and interfaces for identifying non-overlapping linguistic groups (such as base noun phrases) in unrestricted text. This task is called “chunk parsing” or “chunking”, and the identified groups are called “chunks”.

What is GPE in NLTK?

The GPE is a Tree object's label from the pre-trained ne_chunk model.


1 Answers

  1. extract_rels (doc) checks that arguments subjclass and objclass are known NE tags, hence the error with NPH.
  2. The easy, ad hoc, way is to rewrite a customized extract_rels function (example below).

    import nltk
    import re
    
    grammar = r"""
      NPH: {<DT|JJ|NN.*>+}          # Chunk sequences of DT, JJ, NN
      PPH: {<IN><NP>}               # Chunk prepositions followed by NP
      VPH: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments
      CLAUSE: {<NP><VP>}           # Chunk NP, VP
      """
    cp = nltk.RegexpParser(grammar)
    sentence = [("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"),
        ("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
    
    chunked = cp.parse(sentence)
    
    IN = re.compile(r'.*\bon\b')
    
    def extract_rels(subjclass, objclass, chunked, pattern):
    
        # padding because this function checks right context
        pairs = nltk.sem.relextract.tree2semi_rel(chunked) + [[[]]] 
    
        reldicts = nltk.sem.relextract.semi_rel2reldict(pairs)
    
        relfilter = lambda x: (x['subjclass'] == subjclass and
                               pattern.match(x['filler']) and
                               x['objclass'] == objclass)
    
    
        return list(filter(relfilter, reldicts))
    
    for e in extract_rels('NPH', 'NPH', chunked, pattern=IN):
        print(nltk.sem.rtuple(e))
    

    Output:

    [NPH: 'the/DT cat/NN'] 'sit/VB on/IN' [NPH: 'the/DT mat/NN']
    
like image 90
mcoav Avatar answered Sep 30 '22 11:09

mcoav