Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance comparision between Substring vs RegEx when start index is known

Tags:

java

string

regex

I need to extract the first integer found in a java.lang.String and am unsure as to whether to try and use a substring approach or a regex approach:

// Want to extract the 510 into an int.
String extract = "PowerFactor510";

// Either:
int num = Integer.valueof(extract.substring(???));

// Or a regex solution, something like:
String regex = "\\d+";
Matcher matcher = new Matcher(regex);
int num = matcher.find(extract);

So I ask:

  • Which type of solution is more appropriate here, and why?; and
  • If the substring approach is more appropriate, what could I use to indicate the beginning of a number?
  • Else, if the regex is the appropriate solution, what is the regex/pattern/matcher/method I should use to extract the number?

Note: The string will always begin with the word PowerFactor followed by a non-negative integer. Thanks in advance!

like image 811
IAmYourFaja Avatar asked Mar 25 '13 13:03

IAmYourFaja


People also ask

Does regexp take longer than Instr/substr?

I will explain the test below, but here is the bottom line: the REGEXP solution takes 40 times longer than the INSTR/SUBSTR solution. Setup: I created a table with 1.5 million random strings (all exactly eight characters long, all upper-case letters).

Why do we use regular expressions instead of strings?

At best, the regular expression operation can do what's optimal to do the string manipulations. Regular expressions are not used because they can do anything faster than plain string operations, it's used because it can do very complicated operations with little code, with reasonably small overhead.

How to check if regex is good performance?

Benchmarks may assure that regex has good performance. However, it’s not enough to test it on a single matching string. We need to try to move matching part inside the test string. It’s also important to check performance on a string that does not match, especially on a one that is almost OK, as it can cause most backtracking.

Is string manipulation faster than regex?

Although string manipulation will usually be somewhat faster, the actual performance heavily depends on a number of factors, including: As the regex gets more complicated, it will take much more effort and complexity to write equivlent string manipulation code that performs well. Show activity on this post.


2 Answers

The string will always begin with the word "PowerFactor" followed by a non-negative integer

This means you know exactly at which index you will find the number, i would say you better use the substring directly, at least considering the performance it would be much faster than searching and matching work.

extract.substring("PowerFactor".length());

I could not find any direct comparision but you can read about each one of the two options:

  • Java substring performance
  • Java Regex performance
like image 139
CloudyMarble Avatar answered Oct 02 '22 23:10

CloudyMarble


Was a bit curious and tried the following

String extract = "PowerFactor510";
long l = System.currentTimeMillis();
System.out.println(extract.replaceAll("\\D", ""));
System.out.println(System.currentTimeMillis() - l);

System.out.println();

l = System.currentTimeMillis();
System.out.println(extract.substring("PowerFactor".length()));
System.out.println(System.currentTimeMillis() - l);

And it tuned out that the second test was much faster, so substring wins.

like image 43
tmwanik Avatar answered Oct 02 '22 22:10

tmwanik