I need to parse sentences from a paragraph in Python. Is there an existing package to do this, or should I be trying to use regex here?
split("<BRK>"); sentFile = open("./sentences. out", "w+"); for line in sentences: sentFile. write (line); sentFile.
Obviously, if we are talking about a single paragraph with a few sentences, the answer is no brainer: you do it manually by placing your cursor at the end of each sentence and pressing the ENTER key twice.
Splitting textual data into sentences can be considered as an easy task, where a text can be splitted to sentences by '. ' or '/n' characters.
The nltk.tokenize
module is designed for this and handles edge cases. For example:
>>> from nltk import tokenize
>>> p = "Good morning Dr. Adams. The patient is waiting for you in room number 3."
>>> tokenize.sent_tokenize(p)
['Good morning Dr. Adams.', 'The patient is waiting for you in room number 3.']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With