Handling conjunctions when splitting sentences using core-nlp's DocumentPreprocessor

Tags:

I am trying to split a given text into sentences using the core-nlps' DocumentPreprocessor method.

Below is the code which I'm using.

List<String> splitSentencesList = new ArrayList<>();
Reader reader = new StringReader(inputText);
DocumentPreprocessor dp = new DocumentPreprocessor(reader); 
 for(List<HasWord> sentence :dp){
               splitSentencesList.add(Sentence.listToString(sentence).toLowerCase().replace(" .", ""));}

This works for most of the cases. But, how do we handle conjunctions within a sentence?

E.g:

I like coffee and donuts for my breakfast.

Ideally, which should be further handled as :

I like coffee for my breakfast.
I like donuts for my breakfast.

One option is to do a regex based rule to split them further. Is there any inbuilt method to achieve this in core-nlp.

any pointers on this is appreciated.

595

asked Jul 18 '17 11:07

Betafish

1 Answers

The simple answer is: you can't do that using the DocumentPreprocessor. It is designed to split your sentences based on punctuation. There is no way to tell it to split a sentence (or rather duplicate it), when a conjunction (like and) is present.

Your idea to use a regex might just be the easiest way. You could also use CoreNLP's Dependency Parsing and check for a conjunction that connects two direct objects.

Dependency Parse

For the sentence described above, a simple regex might just do the trick, while Dependency Parsing might come in handy, if your sentences get more complex.

170

answered Oct 25 '22 08:10

Tobias Geiselmann

Related questions
                            
                                Running a subset of unit tests when source file changes using Gradle
                            
                                Since which java version SHA-256 and SHA256withRSA are supported for timestamp at signed jar files
                            
                                NoSuchFieldError "ADJUST_DATES_TO_CONTEXT_TIME_ZONE" when trying to parse json
                            
                                Spring @Transactional read-only mode rollback behaviour
                            
                                springboot read tomcat-context.xml
                            
                                JavaFX Application (that uses a preloaded) exits prematurely
                            
                                Java ByteBuffer BigEndian Double
                            
                                PayPal SDK going from payment review page to profilepage
                            
                                libgdx: IOS on screen keyboard not firing events consistently
                            
                                Invalid character found in method name. HTTP method must be tokens
                            
                                intellij GWT debug configuration
                            
                                Hibernate "Column --- cannot be null
                            
                                Does the timezone matter when parsing a Timestamp?
                            
                                Glide Library Loading Very Slow with GIF
                            
                                SQL syntax error using jdbc
                            
                                Does the JSSE in Oracle JDK8 implements TLS Fallback SCSV?
                            
                                How to check if rectangular node is in the window
                            
                                Unable to start solr with Java 9
                            
                                Limit YARN containers programmatically
                            
                                Jackson Ignores Custom Field Deserializer When Value is "null"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Handling conjunctions when splitting sentences using core-nlp's DocumentPreprocessor

Tags:

java

regex

stanford-nlp

Betafish

People also ask

1 Answers

Tobias Geiselmann

Recent Activity

Donate For Us