Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use regex to match strings if the regex has nested group?

There are some strings:

111/aaa
111/aaa|222/bbb

They are in the form of expression:

(.*)/(.*)(|(.*)/(.*))?

I tried to use it to match a string and extract the values:

var rrr = """(.*)/(.*)(|(.*)/(.*))?""".r

"123/aaa|444/bbb" match {
    case rrr(pid,pname, cid,cname) => println(s"$pid, $pname, $cid, $cname")
    case _ => println("not matched ?!")
}

But it prints:

not matched ?!

And I want to get:

123, aaa, 444, bbb

How to fix it?


UPDATE

Thanks for @BartKiers and @Barmar's anser, that I found my regex has several mistakes, and finally found this solution:

var rrr = """(.*?)/(.*?)([|](.*?)/(.*?))?""".r

"123/aaa|444/bbb" match {
    case rrr(pid,pname, _, cid,cname) => println(s"$pid, $pname, $cid, $cname")
    case _ => println("not matched ?!")
}

It works, but you can see there is a _ which is actually not useful. Is there any way to redefine the regex that I can just write rrr(pid,pname,cid,cname) to match it?

like image 553
Freewind Avatar asked Jul 05 '13 06:07

Freewind


1 Answers

.* could lead to a lot of backtracking becuase .* would first match the complete string and then go back one by one until it matches the first /.

Also it won't capture the values in groups properly as you would expect it to..

You should use .*?

Your regex should be

^(.*?)/(.*?)(?:\|(.*?)/(.*?))?$

There wouldn't be any performance difference for small strings but it would capture the values in the right group

Notice the ?: in the regex, it means don't capture the group (?:\|(.*?)/(.*?))?. So it will be 4 subgroups only as the result.

like image 101
Anirudha Avatar answered Oct 01 '22 11:10

Anirudha