Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex match till end of text

Tags:

regex

I'm using Regex to match whole sentences in a text containing a certain string. This is working fine as long as the sentence ends with any kind of punctuation. It does not work however when the sentence is at the end of the text without any punctuation.

This is my current expression:

 [^.?!]*(?<=[.?\s!])string(?=[\s.?!])[^.?!]*[.?!]

Works for:

This is a sentence with string. More text.

Does not work for:

More text. This is a sentence with string

Is there any way to make this word as intended? I can't find any character class for "end of text".

like image 939
TimN Avatar asked Feb 07 '23 03:02

TimN


2 Answers

End of text is matched by the anchor $, not a character class.

You have two separate issues you need to address: (1) the sentence ending directly after string, and (2) the sentence ending sometime after string but with no end-of-sentence punctuation.

To do this, you need to make the match after string optional, but anchor that match to the end of the string. This also means that, after you recognize an (optional) end-of-sentence punctuation mark, you need to match everything that follows, so the end-of-string anchor will match.

My changes: Take everything after string in your original regex and surround it in (?:...)? - the (?:...) being a "non-remembered" group, and the ? making the entire group optional. Follow that with $ to anchor the end of the string.

Within that optional group, you also need to make the end-of-sentence itself optional, by replacing the simple [.?!] with (?:[.?!].*)? - again, the (?:...) is to make a "non-remembered" group, the ? makes the group optional - and the .* allows this to match as much as you want after the end-of-sentence has been found.

[^.?!]*(?<=[.?\s!])string(?:(?=[\s.?!])[^.?!]*(?:[.?!].*)?)?$
like image 73
J Earls Avatar answered Feb 27 '23 19:02

J Earls


The symbol for end-of-text is $ (and, the symbol for beginning-of-text, if you ever need it, is ^).

You probably won't get what you're looking for with by just adding the $ to your punctuation list though (e.g., [.?!$]); you'll find it works better as an alternative choice: ([.?!]|$).

like image 37
Joe DeRose Avatar answered Feb 27 '23 21:02

Joe DeRose