I need to extract the first integer found in a java.lang.String
and am unsure as to whether to try and use a substring
approach or a regex approach:
// Want to extract the 510 into an int.
String extract = "PowerFactor510";
// Either:
int num = Integer.valueof(extract.substring(???));
// Or a regex solution, something like:
String regex = "\\d+";
Matcher matcher = new Matcher(regex);
int num = matcher.find(extract);
So I ask:
Note: The string will always begin with the word PowerFactor
followed by a non-negative integer. Thanks in advance!
I will explain the test below, but here is the bottom line: the REGEXP solution takes 40 times longer than the INSTR/SUBSTR solution. Setup: I created a table with 1.5 million random strings (all exactly eight characters long, all upper-case letters).
At best, the regular expression operation can do what's optimal to do the string manipulations. Regular expressions are not used because they can do anything faster than plain string operations, it's used because it can do very complicated operations with little code, with reasonably small overhead.
Benchmarks may assure that regex has good performance. However, it’s not enough to test it on a single matching string. We need to try to move matching part inside the test string. It’s also important to check performance on a string that does not match, especially on a one that is almost OK, as it can cause most backtracking.
Although string manipulation will usually be somewhat faster, the actual performance heavily depends on a number of factors, including: As the regex gets more complicated, it will take much more effort and complexity to write equivlent string manipulation code that performs well. Show activity on this post.
The string will always begin with the word "PowerFactor" followed by a non-negative integer
This means you know exactly at which index you will find the number, i would say you better use the substring directly, at least considering the performance it would be much faster than searching and matching work.
extract.substring("PowerFactor".length());
I could not find any direct comparision but you can read about each one of the two options:
Was a bit curious and tried the following
String extract = "PowerFactor510";
long l = System.currentTimeMillis();
System.out.println(extract.replaceAll("\\D", ""));
System.out.println(System.currentTimeMillis() - l);
System.out.println();
l = System.currentTimeMillis();
System.out.println(extract.substring("PowerFactor".length()));
System.out.println(System.currentTimeMillis() - l);
And it tuned out that the second test was much faster, so substring
wins.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With