Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java regular expression performance issue

Tags:

java

regex

I am trying to make a function graphing program in Java, and it involves taking a user's input for the function which will be graphed, parsing it, and graphing it. For example, the user might enter in x^2 - y^2, cos(x + y), log(x) - sqrt(y), etc. The program makes use of both infix binary operations (+, -, etc.) and unary operations (cos, sqrt, etc.).

In short, in order to evaluate the unary operations, I must make sure that a given expression follows the format of a single unary operation. For example, cos(x), sqrt(x + y), and log(exp(y) - x) all fit this format, since they are unary operations with some expression as their operand; however, strings such as sin(x)*cos(y) and 1 + log(x) do not follow this format. In order to check, I made a regular expression for this format:

String unaryName = "((productlog)|(zeta)|(log)|(sqrt)|(cos)|(sin)|(tan)|(sec)|(csc)|(csc)|(abs)|(arccos)|(arcsin)|(arctan)|(arcsec)|(arccsc)|(arccot)|(gamma)|(exp))";

(this is just a regex to check if a given string is a name for a predefined unary operation)

String unaryOperation = unaryName + "\\(([^\\(\\)]*(\\(.*\\))*[^\\(\\)]*)+\\)"

I'll give an explanation. This regex is looking for the name of one of the unary operations. After that, it looks for a left-parenthesis. After that, it looks for some sequence of characters which are not parentheses, and then some sequence which begins with a left parenthesis and ends with a right parenthesis. The latter prevents a String such as "sin(x) + cos(y)" from matching.

This regular expression always gives desired results, as far as I can tell. However, in its use, one problem arises. Consider this situation:

String s = "cos(3) + sin(4)";
System.out.println(s.matches(unaryOperation));

Obviously, if the regex works, this should return false, which it does. The same would be true with this example:

String s = "cos(3.000) + sin(4)";
System.out.println(s.matches(unaryOperation));

Nothing really changed, pattern-wise. However, successively adding zeroes to the 3, the match seems to take exponentially longer to evaluate. For me, 12 zeroes takes about 13 seconds. Since my program will be plotting many points on a graph, it will have to calculate thousands of expressions every time it graphs something, so this is a fatal flaw.

I've already found a way around having to use this regular expression and my program works quite nicely, but I would still like to know: why does this regular expression take so long to work for large inputs, and is there any way to change the regex to fix this issue?

like image 625
MikeB Avatar asked Jan 03 '13 03:01

MikeB


1 Answers

You can use this regex

unaryName+"\\([^)]*(\\([^()]*\\))?[^(]*\\)"
                    ------------
                         |->starting from center.

Here I am checking whether the round brackets are properly balanced ..That should solve your problem!

like image 101
Anirudha Avatar answered Oct 04 '22 20:10

Anirudha