There already exist similar questions, but all of them use regexen. The code I'm using (that strips the separators):
string[] sentences = s.Split(new string[] { ". ", "? ", "! ", "... " }, StringSplitOptions.None);
I would like to split a block of text on sentence breaks and keep the sentence terminators. I'd like to avoid using regexen for performance. Is it possible?
I don't believe there is an existing function that does this. However you can use the following extension method.
public static IEnumerable<string> SplitAndKeepSeparators(this string source, string[] separators) {
var builder = new Text.StringBuilder();
foreach (var cur in source) {
builder.Append(cur);
if (separators.Contains(cur)) {
yield return builder.ToString();
builder.Length = 0;
}
}
if (builder.Length > 0) {
yield return builder.ToString();
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With