Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

help with perl regex rules

Tags:

regex

perl

I would need some help with a regex issue in perl. I need to match non_letter characters "nucleated" around letter characters string (of size one).

That is to say... I have a string like

CDF((E)TR)FT

and I want to match ALL the following:

C, D, F((, ((E), )T, R), )F, T.

I was trying with something like

/([^A-Za-z]*[A-Za-z]{1}[^A-Za-z]*)/

but I'm obtaining:

C, D, F((, E), T, R), F, T.

Is like if once a non-letter characters has been matched it can NOT be matched again in another matching.

How can I do this?

like image 462
green69 Avatar asked Mar 01 '26 12:03

green69


1 Answers

A little late on this. Somebody has probably proposed this already.

I would consume the capture in the assertion to the left (via backref) and not consume the capture in the assertion to the right. All the captures can be seen, but the last one is not consumed, so the next pass continues right after the last atomic letter was found.

Character class is simplified for clarity:
/(?=([^A-Z]*))(\1[A-Z])(?=([^A-Z]*))/

(?=([^A-Z]*)) # ahead is optional non A-Z characters, captured in grp 1
(\1[A-Z]) # capture grp 2, consume capture group 1, plus atomic letter
(?=([^A-Z]*)) # ahead is optional non A-Z characters, captured in grp 3

Do globally, in a while loop, combined groups $2$3 (in that order) are the answer.

Test:

$samp = 'CDF((E)TR)FT';

while ( $samp =~ /(?=([^A-Z]*))(\1[A-Z])(?=([^A-Z]*))/g )
{
   print "$2$3, ";
}

output:

C, D, F((, ((E), )T, R), )F, T,