Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python cut a string after Xth sentence

Tags:

python

string

I have to cut a unicode string which is actually an article (contains sentences) I want to cut this article string after Xth sentence in python.

A good indicator of a sentence ending is that it ends with full stop (".") and the word after start with capital name. Such as

myarticle == "Hi, this is my first sentence. And this is my second. Yet this is my third."

How can this be achieved ?

Thanks

like image 399
Hellnar Avatar asked Dec 03 '22 04:12

Hellnar


1 Answers

Consider downloading the Natural Language Toolkit (NLTK). Then you can create sentences that will not break for things like "U.S.A." or fail to split sentences that end in "?!".

>>> import nltk
>>> paragraph = u"Hi, this is my first sentence. And this is my second. Yet this is my third."
>>> sentences = nltk.sent_tokenize(paragraph)
[u"Hi, this is my first sentence.", u"And this is my second.", u"Yet this is my third."]

Your code becomes much more readable. To access the second sentence, you use notation you're used to.

>>> sentences[1]
u"And this is my second."
like image 189
Tim McNamara Avatar answered Dec 22 '22 01:12

Tim McNamara