Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegExp to be recursive

I have the following string:

string>string25>string89 > anotherString

And I have following regExp:

^[\w\_\-\.\d]+(?:\s*)?(?:\>)+(?:\s*)[\w\_\-\.\d]*

And then I want my regExp to be recursive and starts from the first character to the last character. My language is Javascript, but I want to see if my regExp has this capability? or should I use while() condition? I need a solution of regExp itself and if it is not possible, then please give the solution of Javascript while() itself.

EDIT: I want to capture this:

string>string25
string25>string89
string89 > anotherString
like image 673
Mostafa Talebi Avatar asked May 20 '26 20:05

Mostafa Talebi


1 Answers

This is not so much about recursion as getting all matches. In Javascript you have to make your regular expression global

/([^>]+)/g

This will match all sub strings in your string:

string
string25
string89  (including space at the end)

Or you could easily just split your string with > delimiter and collect individuals:

yourString.split(">");

Edit

After you've written your desired result I'd suggest you go with @HamZa's solution that uses positive lookahead. And you'll get pairs back.

/(?=([^>]+>[^>]+))[^>]+>/g

Some explanation

Regular expressions parse strings from from left to right iterating over each character (to simplify the process). Positive lookaheads on the other hand don't progress the current parsing position but rather do what they say: they lookahead if their expression is found:

t(?=s) will match second t in streets because it sees that s is followed by t. But parsing after this match will continue from t on.

I hope this explains it a bit.

Actual solution expression

But to explain the actual regular expression it's a rather clever one how it progresses string parsing:

  1. It first has a positive lookahead (it doesn't increment parsing position) to check if at current parsing position there's a pair you're looking for:

    (?=([^>]+>[^>]+))
    
  2. If lookahead matches such a pair it stores it as a match (hence the inner parentheses)
  3. Then after the lookahead we have the single string expression [^>]+> that doesn't get stored as a match (not within parentheses) but rather takes care that parsing progresses for a single string up to and including the next > character.
  4. Because this regular expression is global it then starts to do a match all over again but this time from the next character position after > character as the previous parsing progressed/incremented/advanced to it.
like image 106
Robert Koritnik Avatar answered May 22 '26 09:05

Robert Koritnik



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!