Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this normal Java regex behavior?

Tags:

java

regex


I found this in some code I wanted to optimize. Here is the snipet:

tempString = bigBuffer.replaceAll("\\n", "");
tempString = tempString.replaceAll("\\t", "");

Then I decided to use the regex wisely and I did this:

tempString = bigBuffer.replaceAll("[\\n\\t]", "");

Then a friend told me to do this instead:

tempString = bigBuffer.replaceAll("\\n|\\t", "");

Since I like to know the result of my changes I did a test to verify if it was a good optimization. So, the result with (java version "1.6.0_27") is with the first code being the reference 100%.

With the pipe it is 121% so it took more time to perform the task.

With the square bracket it is 52% so it took less time to perform the task.

Why does the regex behave differently where it should be the same?

Martin

like image 655
Martin P. Avatar asked Nov 25 '11 18:11

Martin P.


People also ask

What are regular expressions in Java?

A regular expression is a sequence of characters that forms a search pattern. When you search for data in a text, you can use this search pattern to describe what you are searching for. A regular expression can be a single character, or a more complicated pattern.

How do you escape a character in regex Java?

Characters can be escaped in Java Regex in two ways which are listed as follows which we will be discussing upto depth: Using \Q and \E for escaping. Using backslash(\\) for escaping.

How does pattern and matcher work in Java?

The matcher() method of this class accepts an object of the CharSequence class representing the input string and, returns a Matcher object which matches the given string to the regular expression represented by the current (Pattern) object.


1 Answers

The first code snippet looks through bigBuffer twice, the first time replacing the new lines, and the second time replaces the tabs.

The second code snippet would search through bigBuffer only once, checking to see if each character is one or the other. This would result in the speed finishing in only half the time.

The code snippet in the third place is probably poorly compiled, and results in a particularly bad version of the first code's algorithm, though I could not say for sure without examining the path through the regex compilation carefully.

Excellent work on the testing though. Relative timing (percent-based) is useful, absolute timing (millisecond or some such) is not.

like image 65
nrobey Avatar answered Oct 17 '22 19:10

nrobey