I have to cut a unicode string which is actually an article (contains sentences) I want to cut this article string after Xth sentence in python.
A good indicator of a sentence ending is that it ends with full stop (".") and the word after start with capital name. Such as
myarticle == "Hi, this is my first sentence. And this is my second. Yet this is my third."
How can this be achieved ?
Thanks
Consider downloading the Natural Language Toolkit (NLTK
). Then you can create sentences that will not break for things like "U.S.A." or fail to split sentences that end in "?!".
>>> import nltk
>>> paragraph = u"Hi, this is my first sentence. And this is my second. Yet this is my third."
>>> sentences = nltk.sent_tokenize(paragraph)
[u"Hi, this is my first sentence.", u"And this is my second.", u"Yet this is my third."]
Your code becomes much more readable. To access the second sentence, you use notation you're used to.
>>> sentences[1]
u"And this is my second."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With