Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex replace any number of matches at the start of the line

Tags:

c#

regex

replace

I have text with this structure:

1.  Text1
2.  Text 2. It has a number with a dot.
3.  1.   Text31

I want to get this text:

# Text1
# Text 2. It has a number with a dot. (notice that this number did not get replaced)
## Text31

I tried doing the following but it does not work

var pattern = @"^(\s*\d+\.\s*)+";
var replaced = Regex.Replace(str, pattern, "#", RegexOptions.Multiline);

Basically, it should start matching at the start of every line and replace every matched group with # symbol. Currently, if more than one group is matched, everything is replaced by a single # symbol. Pattern I am using is probably incorrect, can anyone come up with a solution?

like image 385
user1242967 Avatar asked Jul 05 '17 07:07

user1242967


People also ask

What does ?= * Mean in RegEx?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

How do you search for a RegEx pattern at the beginning of a string?

The meta character “^” matches the beginning of a particular string i.e. it matches the first character of the string. For example, The expression “^\d” matches the string/line starting with a digit. The expression “^[a-z]” matches the string/line starting with a lower case alphabet.

What does RegEx 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

What is \r and \n in RegEx?

Regex recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.


1 Answers

You may use

(?:\G|^)\s*\d+\.

It matches the start of string or the end of the previous successful match or start of a line, and then zero or more whitespaces, one or more digits and a dot.

Details

  • (?:\G|^) - start of string or end of the previous match (\G) or the start of a line (^)
  • \s* - zero or more whitespaces if you want to only match horizontal whitespaces to avoid overflowing to the next lie(s) replace with [\s-[\r\n]]* or [\p{Zs}\t]*)
  • \d+ - one or more digits (to match only ASCII digits, replace with [0-9]+ or pass the RegexOptions.ECMAScript option to the Regex constructor)
  • \. - a dot.

The RegexOptions.Multiline option must be passed to the Regex constructor to make ^ match the start of a line. Or add an inline version of the anchor, (?m), at the start of the pattern.

For more details about \G anchor, see Continuing at The End of The Previous Match.

See the RegexStorm demo.

like image 193
Wiktor Stribiżew Avatar answered Sep 28 '22 16:09

Wiktor Stribiżew