There are two methods:
private static void normalSplit(String base){
base.split("\\.");
}
private static final Pattern p = Pattern.compile("\\.");
private static void patternSplit(String base){
//use the static field above
p.split(base);
}
And I test them like this in the main method:
public static void main(String[] args) throws Exception{
long start = System.currentTimeMillis();
String longstr = "a.b.c.d.e.f.g.h.i.j";//use any long string you like
for(int i=0;i<300000;i++){
normalSplit(longstr);//switch to patternSplit to see the difference
}
System.out.println((System.currentTimeMillis()-start)/1000.0);
}
Intuitively,I think as String.split
will eventually call Pattern.compile.split
(after a lot of extra work) to do the real thing. I can construct the Pattern object in advance (it is thread safe) and speed up the splitting.
But the fact is, using the pre-constructed Pattern is much slower than calling String.split
directly. I tried a 50-character-long string on them (using MyEclipse), the direct call consumes only half the time of using pre-constructed Pattern object.
Please can someone tell me why this happens ?
Regex will work faster in execution, however Regex's compile time and setup time will be more in instance creation. But if you keep your regex object ready in the beginning, reusing same regex to do split will be faster. String.
String. split(String) won't create regexp if your pattern is only one character long. When splitting by single character, it will use specialized code which is pretty efficient.
This may depend on the actual implementation of Java. I'm using OpenJDK 7, and here, String.split
does indeed invoke Pattern.compile(regex).split(this, limit)
, but only if the string to split by, regex
, is more than a single character.
See here for the source code, line 2312.
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.count == 1 &&
// a bunch of other checks and lots of low-level code
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}
As you are splitting by "\\."
, it is using the "fast path". That is, if you are using OpenJDK.
This is the change in String.split
behaviour, which was made in Java 7
. This is what we have in 7u40
:
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.value.length == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
//do stuff
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}
And this is what we had in 6-b14
public String[] split(String regex, int limit) {
return Pattern.compile(regex).split(this, limit);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With