Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# remove line using regular expression, including line break

Tags:

c#

regex

I need to remove lines that match a particular pattern from some text. One way to do this is to use a regular expression with the begin/end anchors, like so:

var re = new Regex("^pattern$", RegexOptions.Multiline);
string final = re.Replace(initial, "");

This works fine except that it leaves an empty line instead of removing the entire line (including the line break).

To solve this, I added an optional capturing group for the line break, but I want to be sure it includes all of the different flavors of line breaks, so I did it like so:

var re = new Regex(@"^pattern$(\r\n|\r|\n)?", RegexOptions.Multiline);
string final = re.Replace(initial, "");

This works, but it seems like there should be a more straightforward way to do this. Is there a simpler way to reliably remove the entire line including the ending line break (if any)?

like image 724
Jack A. Avatar asked Nov 30 '25 22:11

Jack A.


1 Answers

To match any single line break sequence you may use (?:\r\n|[\r\n\u000B\u000C\u0085\u2028\u2029]) pattern. So, instead of (\r\n|\r|\n)?, you can use (?:\r\n|[\r\n\u000B\u000C\u0085\u2028\u2029])?.

Details:

  • ‎000A - a newline, \n
  • ‎000B - a line tabulation char
  • ‎000C - a form feed char
  • ‎000D - a carriage return, \r
  • ‎0085 - a next line char, NEL
  • ‎2028 - a line separator char ‎- 2029 - a paragraph separator char.

If you want to remove any 0+ non-horizontal (or vertical) whitespace chars after a matched line, you may use [\s-[\p{Zs}\t]]*: any whitespace (\s) but (-[...]) a horizontal whitespace (matched with [\p{Zs}\t]). Note that for some reason, \p{Zs} Unicode category class does not match tab chars.

One more aspect must be dealt with here since you are using the RegexOptions.Multiline option: it makes $ match before a newline (\n) or end of string. That is why if your line endings are CRLF the pattern may fail to match. Hence, add an optional \r? before $ in your pattern.

So, either use

@"^pattern\r?$(?:\r\n|[\r\n\u000B\u000C\u0085\u2028\u2029])?"

or

@"^pattern\r?$[\s-[\p{Zs}\t]]*"
like image 194
Wiktor Stribiżew Avatar answered Dec 02 '25 12:12

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!