Accurately splitting sentences

Question

My program takes a text file and splits each sentence into a list using split('.') meaning that it will split when it registers a full stop however it can be inaccurate.

For Example

str='i love carpets. In fact i own 2.4 km of the stuff.'

Output

listOfSentences = ['i love carpets', 'in fact i own 2', '4 km of the stuff']

Desired Output

 listOfSentences = ['i love carpets', 'in fact i own 2.4 km of the stuff']

My question is: How do I split the end of sentences and not at every full stop.

Adam Bittlingmayer · Accepted Answer

Any regex based approach cannot handle cases like "I saw Mr. Smith.", and adding hacks for those cases is not scalable. As user est has commented, any serious implementation uses data.

If you need to handle English only then spaCy is better than NLTK:

from spacy.en import English
en = English()
doc = en(u'i love carpets. In fact i own 2.4 km of the stuff.')
for s in list(doc.sents):
    print s.string

Update: spaCy now supports many languages.

Accurately splitting sentences

Tags:

python

parsing

nlp

For Example

Output

Desired Output

Marko

1 Answers

Adam Bittlingmayer

Recent Activity

Donate For Us

Accurately splitting sentences

Tags:

python

parsing

nlp

For Example

Output

Desired Output

Marko

1 Answers

Adam Bittlingmayer

Related questions

Recent Activity

Donate For Us