Clause Extraction using Stanford parser

Tags:

I have a complex sentence and I need to separate it into main and dependent clause. For example for the sentence
ABC cites the fact that chemical additives are banned in many countries and feels they may be banned in this state too.
The split required

1)ABC cites the fact   
2)chemical additives are banned in many countries   
3)ABC feels they may be banned in this state too.

I think I could use the Stanford Parser tree or dependencies, but I am not sure how to proceed from here.

The tree

(ROOT
  (S
    (NP (NNP ABC))
    (VP (VBZ cites)
      (NP (DT the) (NN fact))
      (SBAR (IN that)
        (S
          (NP (NN chemical) (NNS additives))
          (VP
            (VP (VBP are)
              (VP (VBN banned)
                (PP (IN in)
                  (NP (JJ many) (NNS countries)))))
            (CC and)
            (VP (VBZ feels)
              (SBAR
                (S
                  (NP (PRP they))
                  (VP (MD may)
                    (VP (VB be)
                      (VP (VBN banned)
                        (PP (IN in)
                          (NP (DT this) (NN state)))
                        (ADVP (RB too))))))))))))
    (. .)))

and the collapsed dependency parse

nsubj(cites-2, ABC-1)  
root(ROOT-0, cites-2)  
det(fact-4, the-3)   
dobj(cites-2, fact-4)  
mark(banned-9, that-5)  
nn(additives-7, chemical-6)  
nsubjpass(banned-9, additives-7)   
nsubj(feels-14, additives-7)   
auxpass(banned-9, are-8)   
ccomp(cites-2, banned-9)   
amod(countries-12, many-11)  
prep_in(banned-9, countries-12)   
ccomp(cites-2, feels-14)    
conj_and(banned-9, feels-14)    
nsubjpass(banned-18, they-15)   
aux(banned-18, may-16)    
auxpass(banned-18, be-17)    
ccomp(feels-14, banned-18)   
det(state-21, this-20)    
prep_in(banned-18, state-21)    
advmod(banned-18, too-22)

859

asked Sep 27 '14 00:09

AMisra

1 Answers

It is probably better if you primarily use the constituenty-based parse tree, and not the dependencies. The dependencies will be helpful, but only after the main work is done! I am going to explain this towards the end of my answer.

This is because constituency-parse is based on phrase structure grammar, which is the most relevant if you are seeking to extract clauses from a sentence. It can be done using dependencies as well, but in that case, you will essentially be reconstructing the phrase structure -- starting from the root and looking at dependent nodes (e.g. ABC and facts are the nominal subject and direct object of the verb cites, and so on ... ).

It is helpful to visualize the parse tree, however. In your example, the clauses are indicated by the SBAR tag, which is a clause introduced by a (possibly empty) subordinating conjunction. All you need to do is the following:

Identify the non-root clausal nodes in the parse tree
Remove (but retain separately) the subtrees rooted at these clausal nodes from the main tree.
In the main tree (after removal of subtrees in step 2), remove any hanging prepositions, subordinating conjunctions and adverbs.

In step 3, what I mean by "hanging" is that any prepositions, etc. whose dependency has been removed in step 2. E.g., from "ABC cites the fact that", you need to remove the preposition/subordinating-conjunction "that" because its dependent node "banned" was removed in step 2. You will thus have three independent clauses:

chemical additives are banned in many countries (SBAR removal in step 2)
they may be banned in this state too (SBAR removal in step 2)
ABC cites the fact (step 3)

The only issue here is the connection ABC--feels. For this, note that both "banned" and "feels" are complements of the verb "cites", and hence, have the same subject, which is "ABC"! And you're done. When this is done, you will get a fourth clause, "ABC feels", which is something you may or may not want to include in your final result.

For a list of all clausal tags (and, in fact, all Penn Treebank tags), see this list: http://www.surdeanu.info/mihai/teaching/ista555-fall13/readings/PennTreebankConstituents.html

For an online parse-tree visualization, you may want to use the online Berkeley parser demo. It helps a lot in forming a better intuition. Here's the image generated for your example sentence: Berkeley Parser Tree

Caveats

Even the best parsers will not always parse sentences correctly, so keep that in mind.
Additionally, many complex sentences involve right node raising, which is almost never parsed correctly by most parsers.
You may need to modify the algorithm a little if a clause is in passive voice.

Apart from these three pitfalls, the above algorithm should work quite accurately.

185

answered Oct 19 '22 13:10

Chthonic Project

Related questions
                            
                                How does gensim calculate doc2vec paragraph vectors
                            
                                PyTorch: RuntimeError: Input, output and indices must be on the current device
                            
                                Open-source OCR library for Arabic [closed]
                            
                                Get weight matrices from gensim word2Vec
                            
                                How to preprocess text for embedding?
                            
                                Detect English verb tenses using NLTK
                            
                                Python parse text from multiple txt file
                            
                                SQL: Most Overdue pair of numbers?
                            
                                How to train the Stanford NLP Sentiment Analysis tool
                            
                                Can stop-words be found automatically?
                            
                                Abbreviation detection
                            
                                tag generation from a small text content (such as tweets)
                            
                                how do I create my own training corpus for stanford tagger?
                            
                                what is distant supervision?
                            
                                Cosine Similarity of Vectors of different lengths?
                            
                                The similar method from the nltk module produces different results on different machines. Why?
                            
                                How to access/use Google's pre-trained Word2Vec model without manually downloading the model?
                            
                                How do I manipulate parse trees?
                            
                                Finding meaningful sub-sentences from a sentence
                            
                                How can i work with Example for nlp.update problem with spacy3.0

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Clause Extraction using Stanford parser

Tags:

nlp

stanford-nlp

AMisra

People also ask

1 Answers

Chthonic Project

Recent Activity

Donate For Us