How can I split a text or paragraph into sentences using Stanford parser? Is there any method that can extract sentences, such as <code>getSentencesFromString()</code> as it's provided for Ruby?

I know there is already an accepted answer...but typically you'd just grab the SentenceAnnotations from an annotated doc. <pre class="prettyprint"><code>// creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution Properties props = new Properties(); props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); // read some text in the text variable String text = ... // Add your text here! // create an empty Annotation just with the given text Annotation document = new Annotation(text); // run all Annotators on this text pipeline.annotate(document); // these are all the sentences in this document // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types List<CoreMap> sentences = document.get(SentencesAnnotation.class); for(CoreMap sentence: sentences) { // traversing the words in the current sentence // a CoreLabel is a CoreMap with additional token-specific methods for (CoreLabel token: sentence.get(TokensAnnotation.class)) { // this is the text of the token String word = token.get(TextAnnotation.class); // this is the POS tag of the token String pos = token.get(PartOfSpeechAnnotation.class); // this is the NER label of the token String ne = token.get(NamedEntityTagAnnotation.class); } } </code></pre> Source - http://nlp.stanford.edu/software/corenlp.shtml (half way down) And if you're only looking for sentences, you can drop the later steps like "parse" and "dcoref" from the pipeline initialization, it'll save you some load and processing time. Rock and roll. ~K

How can I split a text into sentences using the Stanford parser?

Video Answer

2 Answers

You can check the DocumentPreprocessor class. Below is a short snippet. I think there may be other ways to do what you want.

String paragraph = "My 1st sentence. “Does it work for questions?” My third sentence."; Reader reader = new StringReader(paragraph); DocumentPreprocessor dp = new DocumentPreprocessor(reader); List<String> sentenceList = new ArrayList<String>();  for (List<HasWord> sentence : dp) {    // SentenceUtils not Sentence    String sentenceString = SentenceUtils.listToString(sentence);    sentenceList.add(sentenceString); }  for (String sentence : sentenceList) {    System.out.println(sentence); }

158

answered Sep 19 '22 00:09

6 revs, 5 users 65%

I know there is already an accepted answer...but typically you'd just grab the SentenceAnnotations from an annotated doc.

// creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution  Properties props = new Properties(); props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props);  // read some text in the text variable String text = ... // Add your text here!  // create an empty Annotation just with the given text Annotation document = new Annotation(text);  // run all Annotators on this text pipeline.annotate(document);  // these are all the sentences in this document // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types List<CoreMap> sentences = document.get(SentencesAnnotation.class);  for(CoreMap sentence: sentences) {   // traversing the words in the current sentence   // a CoreLabel is a CoreMap with additional token-specific methods   for (CoreLabel token: sentence.get(TokensAnnotation.class)) {     // this is the text of the token     String word = token.get(TextAnnotation.class);     // this is the POS tag of the token     String pos = token.get(PartOfSpeechAnnotation.class);     // this is the NER label of the token     String ne = token.get(NamedEntityTagAnnotation.class);          }  }

Source - http://nlp.stanford.edu/software/corenlp.shtml (half way down)

And if you're only looking for sentences, you can drop the later steps like "parse" and "dcoref" from the pipeline initialization, it'll save you some load and processing time. Rock and roll. ~K

answered Sep 20 '22 00:09

Kevin

Related questions
                            
                                How can I change the Kotlin compiler version on IntelliJ?
                            
                                Using reflection to change static final File.separatorChar for unit testing?
                            
                                How to reuse the same connection with a Spring's JdbcTemplate?
                            
                                Compiling four java files within one package using javac
                            
                                Converting a TreeSet to ArrayList?
                            
                                How to round DateTime of Joda library to the nearest X minutes?
                            
                                Private helper methods vs. public static utility methods in Java
                            
                                InvalidModuleDescriptorException when running my first java app
                            
                                Closing JFrame with button click [duplicate]
                            
                                How to run a jar file in Eclipse
                            
                                Help with understanding a function object or functor in Java
                            
                                StringBuilder .equals Java
                            
                                What is the best way of reading configuration parameters from configuration file in Java?
                            
                                MySQL 5.5.9 and Hibernate table creation error on TYPE
                            
                                working with binary numbers in java
                            
                                How is StringBuffer implementing append function without creating two objects?
                            
                                Output an image file from a servlet [duplicate]
                            
                                Java not found error when loading Apache Netbeans 9.0 or 10
                            
                                How to configure log4j with a properties file
                            
                                Replace all characters not in range (Java String)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I split a text into sentences using the Stanford parser?

Tags:

java

parsing

artificial-intelligence

nlp

stanford-nlp

S Gaber

People also ask

Video Answer

2 Answers

6 revs, 5 users 65%

Kevin

Recent Activity

Donate For Us