Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I determine if a string is not a regular expression?

Tags:

java

regex

I am trying to improve the performance of some code. It looks something like this:

public boolean isImportant(String token) {
    for (Pattern pattern : patterns) {
        return pattern.matches(token).find();
    }
}

What I noticed is that many of the Patterns seem to be simple string literals with no regular expression constructs. So I want to simply store these in a separate list (importantList) and do an equality test instead of performing a more expensive pattern match, such as follows:

public boolean isImportant(String token) {
    if (importantList.contains(token)) return true;

    for (Pattern pattern : patterns) {
        return pattern.matches(token).find();
    }        
}

How do I programmatically determine if a particular string contains no regular expression constructs?

Edit: I should add that the answer doesn't need to be performance-sensitive. (i.e. regular expressions can be used) I'm mainly concerned with the performance of isImportant() because it's called millions of times, while the initialzation of the patterns is only done once.

like image 393
Jin Kim Avatar asked Mar 05 '13 22:03

Jin Kim


People also ask

How do I check if a string is regular expression?

Use the test() method to check if a regular expression matches an entire string, e.g. /^hello$/. test(str) . The caret ^ and dollar sign $ match the beginning and end of the string. The test method returns true if the regex matches the entire string, and false otherwise.

Is a string a regular expression?

Definitions. In formal language theory, a regular expression (a.k.a. regex, regexp, or r.e.), is a string that represents a regular (type-3) language. Huh?? Okay, in many programming languages, a regular expression is a pattern that matches strings or pieces of strings.

What is the difference between regular expression and string?

Regular expressions are just strings themselves. Each character in a regular expression can either be part of a code that makes up a pattern to search for, or it can represent a letter, character or word itself.

How do you identify a regular expression?

A regex (regular expression) consists of a sequence of sub-expressions. In this example, [0-9] and + . The [...] , known as character class (or bracket list), encloses a list of characters. It matches any SINGLE character in the list.


2 Answers

I normally hate answers that say this but...

Don't do that.

It probably won't make the code run faster, in fact it might even cause the program to take more time.

if you really need to optimize your code, there are likely much mush much more effective places where you can go.

like image 133
Sam I am says Reinstate Monica Avatar answered Sep 30 '22 12:09

Sam I am says Reinstate Monica


It's going to be difficult. You can check for the non-presence of any regex metacharacters; that should be a good approximation:

Pattern regex = Pattern.compile("[$^()\\[\\]{}.*+?\\\\]");
Matcher regexMatcher = regex.matcher(subjectString);
regexIsLikely = regexMatcher.find();

Whether it's worth it is another question. Are you sure a regex match is slower than a list lookup (especially since you'll be doing a regex match after that in many cases anyway)? I'd bet it's much faster to just keep the regex match.

like image 39
Tim Pietzcker Avatar answered Sep 30 '22 12:09

Tim Pietzcker