Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does my regex work on RegexPlanet and regex101 but not in my code?

Tags:

java

regex

Given the string #100=SAMPLE('Test','Test', I want to extract 100 and Test. I created the regular expression ^#(\d+)=SAMPLE\('([\w-]+)'.* for this purpose.

I tested the regex on RegexPlanet and regex101. Both tools give me the expected results, but when I try to use it in my code I don't get matches. I used the following snippet for testing the regex:

final String line = "#100=SAMPLE('Test','Test',";
final Pattern pattern = Pattern.compile("^#(\\d+)=SAMPLE\\('([\\w-]+)'.*");
final Matcher matcher = pattern.matcher(line);

System.out.println(matcher.matches());
System.out.println(matcher.find());
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));

The output is

true
false
Exception in thread "main" java.lang.IllegalStateException: No match found
    at java.util.regex.Matcher.group(Matcher.java:536)
    at java.util.regex.Matcher.group(Matcher.java:496)
    at Test.main(Test.java:15)

I used Java 8 for compiling and running the program. Why does the regex work with the online tools but not in my program?

like image 353
stevecross Avatar asked Sep 07 '15 07:09

stevecross


People also ask

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

How does regex work in C#?

In C#, Regular Expression is a pattern which is used to parse and check whether the given input text is matching with the given pattern or not. In C#, Regular Expressions are generally termed as C# Regex. The . Net Framework provides a regular expression engine that allows the pattern matching.

How do I use regex to match?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

Does regex only work on strings?

So, yes, regular expressions really only apply to strings. If you want a more complicated FSM, then it's possible to write one, but not using your local regex engine.

Why doesn't my regex work with multiline strings?

Another issue related to the fact that you test your regex against a multiline string (not a list of standalone strings/lines) is that your patterns may consume the end of line, , char with negated character classes, see an issue like that. \D matched the end of line char, and in order to avoid it, [^\d ] could be used, or other alternatives.

What are the different regex solutions for different libraries?

Solutions can be different for each regex library: either use \R (PCRE, Java, Ruby) or some kind of \v (Boost, PCRE), ? , (?: ?| ) / (?> ?| ) (good for .NET) or [ ]+ in other libraries (see answers for C#, PHP ).

How to use word boundaries in Oracle regex?

Oracle regex does not support word boundaries at all, use workarounds as shown in Regex matching works on regex tester but not in oracle - In Firestore security rules, the regular expression needs to be passed as a string, which also means it shouldn't be wrapped in / symbols, i.e. use allow create: if docId.matches ("^\d+$") ....

Is it possible to use JS regex with [^] construct in Python?

See an issue that occurred due to using a JS-only compatible regex with [^] construct in Python. JS regex - at the time of answering this question - did not support lookbehinds. Now, it becomes more and more adopted after its introduction in ECMAScript 2018. You do not really need it here since you can use capturing groups:


Video Answer


2 Answers

A Matcher object allows you to query it several times, so that you can find the expression, get the groups, find the expression again, get the groups, and so on.

This means that it keeps state after each call - both for the groups that resulted from a successful match, and the position where to continue searching.

When you run two matching/finding methods consecutively, what you have is:

  1. matches() - Matches at the beginning of the string, sets the groups.
  2. find() - tries to find the next occurrence of the pattern after the previously matched/found occurrence, sets the groups.

But of course, in your case, the text doesn't contain two occurrences of the pattern, only one. So although matches() was successful and set proper groups, the find() then fails to find another match, and the groups are invalid (the groups are not accessible after a failed match/find).

And that's why you get the error message.

Now, if you're just playing around with this, to see the difference between matches and find, then there is nothing wrong with having both of them in the program. But you need to use reset() between them, which will cause find() not to try to continue from where matches() stopped (which will always fail if matches() succeeded). Instead, it will start scanning from the start, as if you had a fresh Matcher. And it will succeed and give you groups.

But as other answers here hinted, if you're not just trying to compare the results of matches and find, but just wanted to match your pattern and get the results, then you should choose only one of them.

  • matches() will try to match the entire string. For this reason, if it succeeds, running find() after it will never succeed - because it starts searching at the end of the string. If you use matches(), you don't need anchors like ^ and $ at the beginning and the end of your pattern.
  • find() will try to match anywhere in the string. It will start scanning from the left, but doesn't require that the actual match start there. It is also possible to use it more than once.
  • lookingAt() will try to match at the beginning of the string, but will not necessarily match the complete string. It's like having an ^ anchor at the beginning of your pattern.

So you choose which one of these is appropriate for you, and use it, and then you can use the groups. Always test that the match succeeded before attempting to use the groups!

like image 74
RealSkeptic Avatar answered Oct 23 '22 20:10

RealSkeptic


As the @RealSkeptic mentioned, you should remove the call to matcher.find() in your code, which was advancing the matcher before you had a chance to find all the groups and output them to the console. The rest of your code remains as is:

final String line = "#100=SAMPLE('Test','Test',";
final Pattern pattern = Pattern.compile("^#(\\d+)=SAMPLE\\('([\\w-]+)'.*");
final Matcher matcher = pattern.matcher(line);

System.out.println(matcher.matches());
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));

Output:

true
100
Test
like image 21
Tim Biegeleisen Avatar answered Oct 23 '22 21:10

Tim Biegeleisen