Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

First Sentence Regex

Tags:

regex

php

I'm after a regex ( php / perl compatible ) to get the first sentence out of some text. I realize this could get huge if covering every case, but just after something that will be "good enough" at the moment. Anyone got something off the shelf for this?

like image 365
Tim Avatar asked Dec 04 '25 00:12

Tim


1 Answers

What you need, in the end, is natural language parsing, which is extremely difficult to do, and probably impossible for regular expressions (even super-souped up PCRE ones) alone. Consider this sentence:

So much for Mr. Regex and his sentence matching.

Every answer given thus far will parse that as two sentences, and this isn't even that much of an edge case - it's quite reasonable to imagine a block of text beginning with "Dear Mr. Adams:" or something like that. You can tack on lookbehinds to check what the word before the punctuation mark was, but that's going to get unmaintainable, since you have to check for every possible abbreviation. You have to check for Mr. and e.g. and co. and St. and for so many other ones that you'll never think of. You might end up with a "pretty good" practical solution after a while, but it's going to be ugly, and one day it will fail.

like image 104
Chris Lutz Avatar answered Dec 05 '25 15:12

Chris Lutz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!