Is there a way to get only the unique matches? without using a list or a map after the matching, I want the matcher output to be unique right away.
Sample input/output:
String input = "This is a question from [userName] about finding unique regex matches for [inputString] without using any lists or maps. -[userName].";
Pattern pattern = Pattern.compile("\\[[^\\[\\]]*\\]");
Matcher matcher = pattern.matcher(rawText);
while (matcher.find()) {
String tokenName = matcher.group(0);
System.out.println(tokenName);
}
This will output the following:
[userName]
[inputString]
[userName]
But I want it to output the following:
[userName]
[inputString]
Difference between matches() and find() in Java RegexThe matches() method returns true If the regular expression matches the whole text. If not, the matches() method returns false. Whereas find() search for the occurrence of the regular expression passes to Pattern.
You can use this regex /^[ A-Za-z0-9_@./#&+-]*$/.
The Regex class itself is thread safe and immutable (read-only). That is, Regex objects can be created on any thread and shared between threads; matching methods can be called from any thread and never alter any global state.
Throw in an * (asterisk), and it will match everything. Read more. \s (whitespace metacharacter) will match any whitespace character (space; tab; line break; ...), and \S (opposite of \s ) will match anything that is not a whitespace character.
Yes there is. You can combine a negative lookahead and a backreference:
"(\\[[^\\[\\]]*\\])(?!.*\\1)"
That will only match if that, which was matched by your actual pattern, does not occur again in the string. Effectively, that means you always get the last occurrence of every match, so you would get them in a different order:
[inputString]
[userName]
If the order is a problem for you (i.e. if it's crucial to order them by first occurrence), you won't be able to do this using regex only. You would need a variable-length look*behind* for that, and that is not supported by Java.
Further reading:
Some notes on a general solution
Note that this will work with any pattern whose matches are of non-zero width. The general solution is simply:
(yourPatternHere)(?!.*\1)
(I left out the double backslash, because that only applies to a few languages.)
If you want it to work with patterns that have zero-width matches (because you only want to know a position and are using lookarounds only for some reason), you could do this:
(zeroWidthPatternHere)(?!.+\1)
Also, note that (generally) you might have to use the "singleline" or "dotall" option, if your input may contain linebreaks (otherwise the lookahead will only check in the current line). If you cannot or do not want to activate that (because you have a pattern that includes periods which should not match line breaks; or because you use JavaScript), this is the general solution:
(yourPatternHere)(?![\s\S]*\1)
And to make this answer even more widely applicable, here is how you could match only the first occurrence of every match (in an engine with variable-length lookbehinds, like .NET):
(yourPatternHere)(?<!\1.*\1)
or
(yourPatternHere)(?<!\1[\s\S]*\1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With