I am trying to create custom chunk tags and to extract relations from them. Following is the code that takes me to the cascaded chunk tree.
grammar = r"""
NPH: {<DT|JJ|NN.*>+} # Chunk sequences of DT, JJ, NN
PPH: {<IN><NP>} # Chunk prepositions followed by NP
VPH: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments
CLAUSE: {<NP><VP>} # Chunk NP, VP
"""
cp = nltk.RegexpParser(grammar)
sentence = [("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"),
("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
chunked = cp.parse(sentence)
Output -
(S (NPH Mary/NN) saw/VBD (NPH the/DT cat/NN) sit/VB on/IN (NPH the/DT mat/NN))
Now I am trying to extract relations between the NPH tag values with the text in between using the nltk.sem.extract_rels function, BUT it seems to work ONLY on named entities generated with the ne_chunk function.
IN = re.compile(r'.*\bon\b')
for rel in nltk.sem.extract_rels('NPH', 'NPH', chunked,corpus='ieer',pattern = IN):
print(nltk.sem.rtuple(rel))
This gives the following error -
ValueError: your value for the subject type has not been recognized: NPH
Is there an easy way to use only chunk tags to create relations as I don't really want to retrain the NER model to detect my chunk tags as respective named entities
Thank you!
This syntactic chunking effect indicates that the participants created a representation of the syntactic structure of whole numbers, and this representation allowed them to create increasingly longer chunks for increasingly longer grammatical segments.
An O tag indicates that a token belongs to no chunk. The B- prefix before a tag indicates that the tag is the beginning of a chunk that immediately follows another chunk without O tags between them. It is used only in that case: when a chunk comes after an O tag, the first token of the chunk takes the I- prefix.
chunk package. Classes and interfaces for identifying non-overlapping linguistic groups (such as base noun phrases) in unrestricted text. This task is called “chunk parsing” or “chunking”, and the identified groups are called “chunks”.
The GPE is a Tree object's label from the pre-trained ne_chunk model.
extract_rels
(doc)
checks that arguments subjclass
and objclass
are known NE tags, hence the error with NPH
.The easy, ad hoc, way is to rewrite a customized extract_rels
function (example below).
import nltk
import re
grammar = r"""
NPH: {<DT|JJ|NN.*>+} # Chunk sequences of DT, JJ, NN
PPH: {<IN><NP>} # Chunk prepositions followed by NP
VPH: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments
CLAUSE: {<NP><VP>} # Chunk NP, VP
"""
cp = nltk.RegexpParser(grammar)
sentence = [("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"),
("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
chunked = cp.parse(sentence)
IN = re.compile(r'.*\bon\b')
def extract_rels(subjclass, objclass, chunked, pattern):
# padding because this function checks right context
pairs = nltk.sem.relextract.tree2semi_rel(chunked) + [[[]]]
reldicts = nltk.sem.relextract.semi_rel2reldict(pairs)
relfilter = lambda x: (x['subjclass'] == subjclass and
pattern.match(x['filler']) and
x['objclass'] == objclass)
return list(filter(relfilter, reldicts))
for e in extract_rels('NPH', 'NPH', chunked, pattern=IN):
print(nltk.sem.rtuple(e))
Output:
[NPH: 'the/DT cat/NN'] 'sit/VB on/IN' [NPH: 'the/DT mat/NN']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With