Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to break up a paragraph by sentences in Python

I need to parse sentences from a paragraph in Python. Is there an existing package to do this, or should I be trying to use regex here?

like image 402
David542 Avatar asked Feb 28 '12 00:02

David542


People also ask

How do you separate sentences from a paragraph in Python?

split("<BRK>"); sentFile = open("./sentences. out", "w+"); for line in sentences: sentFile. write (line); sentFile.

How do you split a paragraph in a sentence?

Obviously, if we are talking about a single paragraph with a few sentences, the answer is no brainer: you do it manually by placing your cursor at the end of each sentence and pressing the ENTER key twice.

How do you split sentences in NLP?

Splitting textual data into sentences can be considered as an easy task, where a text can be splitted to sentences by '. ' or '/n' characters.


1 Answers

The nltk.tokenize module is designed for this and handles edge cases. For example:

>>> from nltk import tokenize
>>> p = "Good morning Dr. Adams. The patient is waiting for you in room number 3."
>>> tokenize.sent_tokenize(p)
['Good morning Dr. Adams.', 'The patient is waiting for you in room number 3.']
like image 90
strcat Avatar answered Nov 04 '22 11:11

strcat