I am trying to improve the performance of some code. It looks something like this:
public boolean isImportant(String token) {
for (Pattern pattern : patterns) {
return pattern.matches(token).find();
}
}
What I noticed is that many of the Patterns seem to be simple string literals with no regular expression constructs. So I want to simply store these in a separate list (importantList) and do an equality test instead of performing a more expensive pattern match, such as follows:
public boolean isImportant(String token) {
if (importantList.contains(token)) return true;
for (Pattern pattern : patterns) {
return pattern.matches(token).find();
}
}
How do I programmatically determine if a particular string contains no regular expression constructs?
Edit: I should add that the answer doesn't need to be performance-sensitive. (i.e. regular expressions can be used) I'm mainly concerned with the performance of isImportant() because it's called millions of times, while the initialzation of the patterns is only done once.
Use the test() method to check if a regular expression matches an entire string, e.g. /^hello$/. test(str) . The caret ^ and dollar sign $ match the beginning and end of the string. The test method returns true if the regex matches the entire string, and false otherwise.
Definitions. In formal language theory, a regular expression (a.k.a. regex, regexp, or r.e.), is a string that represents a regular (type-3) language. Huh?? Okay, in many programming languages, a regular expression is a pattern that matches strings or pieces of strings.
Regular expressions are just strings themselves. Each character in a regular expression can either be part of a code that makes up a pattern to search for, or it can represent a letter, character or word itself.
A regex (regular expression) consists of a sequence of sub-expressions. In this example, [0-9] and + . The [...] , known as character class (or bracket list), encloses a list of characters. It matches any SINGLE character in the list.
I normally hate answers that say this but...
Don't do that.
It probably won't make the code run faster, in fact it might even cause the program to take more time.
if you really need to optimize your code, there are likely much mush much more effective places where you can go.
It's going to be difficult. You can check for the non-presence of any regex metacharacters; that should be a good approximation:
Pattern regex = Pattern.compile("[$^()\\[\\]{}.*+?\\\\]");
Matcher regexMatcher = regex.matcher(subjectString);
regexIsLikely = regexMatcher.find();
Whether it's worth it is another question. Are you sure a regex match is slower than a list lookup (especially since you'll be doing a regex match after that in many cases anyway)? I'd bet it's much faster to just keep the regex match.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With