Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex issue to parse a string

Tags:

java

regex

I was trying to generate a regex to be used in Java using this link.

I can have the following kind of strings.

1. customer calls <function_name> using <verb> on <uri> with <object>
2. customer calls <function_name> using 'POST' on <uri> with <object>
3. customer calls 'create' using 'POST' on <uri> with <object>
4. customer calls 'create' using 'POST' on <uri>

As you can see, the last portion after with is optional in my case.

I implemented the following regular expression.

.+call[s]?.+(\'\w+\'|<\w+>).+using.+(\'\w+\'|<\w+>).+on.+(\'\w+\'|<\w+>).*(with.+(\'\w+\'|<\w+>))?

But when I give string 3, I am getting the output as 'create','POST',<object>, null, null instead of 'create','POST',<uri>, <object>. When I give string 4, the output is 'create','POST',<uri>, null, null instead of 'create','POST',<uri>.

The regex without (with.+(\'\w+\'|<\w+>))? works properly for string 4. How can I change this last part where I need to make the section from with optional?

like image 572
Philip John Avatar asked Jun 21 '26 04:06

Philip John


2 Answers

Your regex accepts too much and backtracks a lot due to your overuse of the greedy .+. Remember that every time you write .+ or .*, the regex engine matches everything up to the end of the line and then needs to backtrack. This is both expensive and error prone - it eats up too much text nearly every time, and you should be very careful when using this construct. It doesn't act like most people expect it to.

The simple solution in your case is to actually state precisely what you're expecting, and from your example text it looks like you need whitespace, so just use \s+ instead. Your regex becomes:

.+?\bcalls?\s+(\'\w+\'|<\w+>)\s+using\s+(\'\w+\'|<\w+>)\s+on\s+(\'\w+\'|<\w+>)(?:\s+with\s+(\'\w+\'|<\w+>))?

Demo

Note that I also changed the first .+ to a lazy .+? (even though you could probably just remove it from the pattern unless you also need the full line to be captured) followed by a word boundary anchor \b. I also changed a group to be noncapturing, since you most probably don't need to capture that.

like image 196
Lucas Trzesniewski Avatar answered Jun 22 '26 16:06

Lucas Trzesniewski


Use [ ]+ in place of .+ for space

Try this:

.+call(?:s)?.+(\'\w+\'|<\w+>)[ ]*using.+(\'\w+\'|<\w+>)[ ]*on[ ]*(\'\w+\'|<\w+>)[ ]*(?:with)?[ ]*(\'\w+\'|<\w+>)?

You will get

 1. <function_name> <verb> <uri> <object>    
 2. 'create' 'POST' <uri> <object>    
 3. <function_name> 'POST' <uri> <object>    
 4. 'create' 'POST' <uri> null

in 4th row last one is null because end token (i.e. <object>) is missing

like image 40
Mahendra Avatar answered Jun 22 '26 18:06

Mahendra



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!