Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extracting recurring pattern with regular expressions

Tags:

c#

regex

I have some text where a list of (id (in the form Pnumber) , a dash and a name) are written. like in:

P1 - code23
P2 - name asd, P3 -name3
P3 - 837/55 P5 - code/55

as you see the couples PX - name can be divided by \n, comma,or simple spaces.

with the regexp pattern

(((?<id>P\d)(\s)?-(\s)?(?<name>(.)*)(,)?(\n)?))   

I can extract the name group of matches repeated on different lines, but not the one divided by , or space. the names extracted from the text above are

code23 (right)
name asd, P3 -name3 (wrong)
837/55 P5 - code/55 (wrong)

How can I modify my pattern?

like image 990
pomarc Avatar asked Mar 27 '26 02:03

pomarc


1 Answers

You may try

(?<id>P\d+)\s*-\s*(?<name>.*?)(?=$|,?\s*P\d)

See the regex demo (\r? added in the demo only because multiline mode is on and the input is multiline, if the strings are handled separately, no \r? and multiline mode are necessary).

Explanation:

  • (?<id>P\d+) -Group ID, P + 1+ digits
  • \s*-\s* - 0+ whitespaces, - and again 0+ whitespaces
  • (?<name>.*?) - Group NAME that captures 0+ chars other than newline up to the first
  • (?=$|,?\s*P\d) - end of string (yes, the only one) or an optional comma, 0+ whitespaces, P and a digit.

Results:

enter image description here

like image 147
Wiktor Stribiżew Avatar answered Mar 29 '26 15:03

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!