How to extract relationship from text in NLTK

Tags:

Hi I'm trying to extract relationships from a string of text based on the second last example here: https://web.archive.org/web/20120907184244/http://nltk.googlecode.com/svn/trunk/doc/howto/relextract.html

From a string such as "Michael James editor of Publishers Weekly" my desired result is to have an output such as:

[PER: 'Michael James'] ', editor of' [ORG: 'Publishers Weekly']

What is the best way to do do this? What format does extract_rels expect and how do I format my input to meet that requirement?

Tried to do it myself but it didn't work. Here is the code I've adapted from the book. I'm not getting any results printed. What am I doing wrong?

class doc():
 pass

doc.headline = ['this is expected by nltk.sem.extract_rels but not used in this script']

def findrelations(text):
roles = """
(.*(                   
analyst|
editor|
librarian).*)|
researcher|
spokes(wo)?man|
writer|
,\sof\sthe?\s*  # "X, of (the) Y"
"""
ROLES = re.compile(roles, re.VERBOSE)
tokenizedsentences = nltk.sent_tokenize(text)
for sentence in tokenizedsentences:
    taggedwords  = nltk.pos_tag(nltk.word_tokenize(sentence))
    doc.text = nltk.batch_ne_chunk(taggedwords)
    print doc.text
    for rel in relextract.extract_rels('PER', 'ORG', doc, corpus='ieer', pattern=ROLES):
        print relextract.show_raw_rtuple(rel) # doctest: +ELLIPSIS

text ="Michael James editor of Publishers Weekly"

findrelations(text)

798

asked Sep 04 '12 13:09

CraigH

1 Answers

here a code based on yours (just few adjusts) that work well ;)

import nltk
import re 
from nltk.chunk import ne_chunk_sents
from nltk.sem import relextract


def findrelations(text):
    roles = """
    (.*(                   
    analyst|
    editor|
    librarian).*)|
    researcher|
    spokes(wo)?man|
    writer|
    ,\sof\sthe?\s*  # "X, of (the) Y"
    """
    ROLES = re.compile(roles, re.VERBOSE)

    sentences = nltk.sent_tokenize(text)
    tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
    tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
    chunked_sentences = nltk.ne_chunk_sents(tagged_sentences)


    for doc in chunked_sentences:
        print doc
        for rel in relextract.extract_rels('PER', 'ORG', doc, corpus='ace', pattern=ROLES):
            #it is a tree, so you need to work on it to output what you want
            print relextract.show_raw_rtuple(rel) 

findrelations('Michael James editor of Publishers Weekly')

(S (PERSON Michael/NNP) (PERSON James/NNP) editor/NN of/IN (ORGANIZATION Publishers/NNS Weekly/NNP))

160

answered Oct 10 '22 03:10

Vinicius Woloszyn

Related questions
                            
                                TfidfVectorizer in sklearn how to specifically INCLUDE words
                            
                                PyCharm can't find Spacy Model 'en'
                            
                                How to find the closest word to a vector using BERT
                            
                                scikit cosine_similarity vs pairwise_distances
                            
                                How to get started on Information Extraction?
                            
                                C++ - How to read Unicode characters( Hindi Script for e.g. ) using C++ or is there a better Way through some other programming language?
                            
                                Programming tips with Japanese Language/Characters [closed]
                            
                                How to stop NLTK from outputting to terminal when downloading data?
                            
                                What is the most accurate open-source tool for sentence splitting? [closed]
                            
                                How to create corpus or corpora for classifying text in NLTK? [duplicate]
                            
                                Code example for Sentiment Analysis for Asian languages - Python NLTK
                            
                                Input 0 of layer lstm_5 is incompatible with the layer: expected ndim=3, found ndim=2
                            
                                Handling \u200b (Zero width space) character in text preprocessing for NLP task
                            
                                HuggingFace BERT `inputs_embeds` giving unexpected result
                            
                                Looking for a database of n-grams taken from wikipedia
                            
                                Indexing and Searching Over Word Level Annotation Layers in Lucene
                            
                                Processing malformed text data with machine learning or NLP
                            
                                How to extract chunks from BIO chunked sentences? - python
                            
                                tf-idf and previously unseen terms

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to extract relationship from text in NLTK

Tags:

nlp

nltk

CraigH

People also ask

1 Answers

Vinicius Woloszyn

Recent Activity

Donate For Us