Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stanford CoreNLP OpenIE annotator

I have a question regarding Stanford CoreNLP OpenIE annotator.

I am using Stanford CoreNLP version stanford-corenlp-full-2015-12-09 in order to extract relations using OpenIE. I don't know much Java that's why I am using the pycorenlp wrapper for Python 3.4.

I want to extract relation between all words of a sentence, below is the code I used. I am also interested in showing the confidence of each triplet:

import nltk
from pycorenlp import *
import collections
nlp=StanfordCoreNLP("http://localhost:9000/")
s="Twenty percent electric motors are pulled from an assembly line"
output = nlp.annotate(s, properties={"annotators":"tokenize,ssplit,pos,depparse,natlog,openie",
                                 "outputFormat": "json","triple.strict":"true"})
result = [output["sentences"][0]["openie"] for item in output]
print(result)
for i in result:
for rel in i:
    relationSent=rel['relation'],rel['subject'],rel['object']
    print(relationSent)

This is the result i got:

[[{'relationSpan': [4, 6], 'subject': 'Twenty percent electric motors', 'objectSpan': [8, 10], 'relation': 'are pulled from', 'object': 'assembly line', 'subjectSpan': [0, 4]}, {'relationSpan': [4, 6], 'subject': 'percent electric motors', 'objectSpan': [8, 10], 'relation': 'are pulled from', 'object': 'assembly line', 'subjectSpan': [1, 4]}, {'relationSpan': [4, 5], 'subject': 'Twenty percent electric motors', 'objectSpan': [5, 6], 'relation': 'are', 'object': 'pulled', 'subjectSpan': [0, 4]}, {'relationSpan': [4, 5], 'subject': 'percent electric motors', 'objectSpan': [5, 6], 'relation': 'are', 'object': 'pulled', 'subjectSpan': [1, 4]}]]

And the triplets are:

('are pulled from', 'Twenty percent electric motors', 'assembly line')
('are pulled from', 'percent electric motors', 'assembly line')
('are', 'Twenty percent electric motors', 'pulled')
('are', 'percent electric motors', 'pulled')

First problem is that the confidence is not showing in the result. Second problem is that I only want to retrieve the triplet that that includes all words of the sentence i.e this triplet:

('are pulled from', 'Twenty percent electric motors', 'assembly line')

What I’m getting is more than one combination of triplets. I tried to use the option "triple.strict":"true" because it extracts "triples only if they consume the entire fragment" but it is NOT working.

Can anyone advise me on this?

like image 745
Shany Avatar asked May 22 '16 13:05

Shany


People also ask

How does OpenIE work?

The OpenIE annotator ( openie ) requires the natural logic annotation ( natlog ). In addition to extracting relation triples, the annotator produces a number of sentence fragments corresponding to entailed fragments from the given original sentence.

Is Stanford Corenlp open source?

These software distributions are open source, licensed under the GNU General Public License (v3 or later for Stanford CoreNLP; v2 or later for the other releases).


2 Answers

You should try this setting:

"openie.triple.strict":"true"

Looking through the code it appears at this time the confidence is not stored with the returned json, so you cannot get that from the CoreNLP server.

Since you bring this up I will push a change that will add those to the output json and let you know when that is live on the GitHub.

like image 130
StanfordNLPHelp Avatar answered Oct 19 '22 20:10

StanfordNLPHelp


Thanks a lot, it is working now i added both: "openie.triple.strict":"true" and "openie.max_entailments_per_clause":"1" the code now is:

output = nlp.annotate(chunkz, properties={"annotators":"tokenize,ssplit,pos,depparse,natlog,openie",
                                "outputFormat": "json",
                                 "openie.triple.strict":"true",
                                 "openie.max_entailments_per_clause":"1"})
like image 29
Shany Avatar answered Oct 19 '22 22:10

Shany