Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java - Parsing strings - String.split() versus Pattern & Matcher

Given a String containing a comma delimited list representing a proper noun & category/description pair, what are the pros & cons of using String.split() versus Pattern & Matcher approach to find a particular proper noun and extract the associated category/description pair?

The haystack String format will not change. It will always contain comma delimited data in the form of PROPER_NOUN|CATEGORY/DESCRIPTION

Common variables for both approaches:

String haystack="EARTH|PLANET/COMFORTABLE,MARS|PLANET/HARDTOBREATHE,PLUTO|DWARF_PLANET/FARAWAY";
String needle="PLUTO";
String result=null;

Using String.split():

for (String current : haystack.split(","))
    if (current.contains(needle))
    {
        result=current.split("\\|")[1]);
        break; // *edit* Not part of original code - added in response to comment from Pshemo
    {

Using Pattern & Matcher:

Pattern pattern = pattern.compile("(" +needle+ "\|)(\w+/\w+)");
Matcher matches = pattern.matcher(haystack);

if (matches.find())
    result=matches.group(2);

Both approaches provide the information I require.

I'm wondering if any reason exists to choose one over the other. I am not currently using Pattern & Matcher within my project so this approach will require imports from java.util.regex

And, of course, if there is an objectively 'better' way to parse the information I will welcome your input.

Thank you for your time!

Conclusion

I've opted for the Pattern/Matcher approach. While a little tricky to read w/the regex, it is faster than .split()/.contains()/.split() and, more importantly to me, captures the first match only.

For what it is worth, here are the results of my imperfect benchmark tests, in nanoseconds, after 100,000 iterations:

.split()/.contains()/.split

304,212,973

Pattern/Matcher w/ Pattern.compile() invoked for each iteration

230,511,000

Pattern/Matcher w/Pattern.compile() invoked prior to iteration

111,545,646

like image 511
IdusOrtus Avatar asked Jul 17 '14 21:07

IdusOrtus


3 Answers

In a small case such as this, it won't matter that much. However, if you have extremely large strings, it may be beneficial to use Pattern/Matcher directly.

Most string functions that use regular expressions (such as matches(), split(), replaceAll(), etc.) makes use of Matcher/Pattern directly. Thus it will create a Matcher object every time, causing inefficiency when used in a large loop.

Thus if you really want speed, you can use Matcher/Pattern directly and ideally only create a single Matcher object.

like image 162
Xinzz Avatar answered Nov 15 '22 21:11

Xinzz


There are no advantages to using pattern/matcher in cases where the manipulation to be done is as simple as this.

You can look at String.split() as a convenience method that leverages many of the same functionalities you use when you use a pattern/matcher directly.

When you need to do more complex matching/manipulation, use a pattern/matcher, but when String.split() meets your needs, the obvious advantage to using it is that it reduces code complexity considerably - and I can think of no good reason to pass this advantage up.

like image 38
drew moore Avatar answered Nov 15 '22 20:11

drew moore


I would say that the split() version is much better here due to the following reasons:

  • The split() code is very clear, and it is easy to see what it does. The regex version demands much more analysis.
  • Regular expressions are more complex, and therefore the code becomes more error-prone.
like image 31
Keppil Avatar answered Nov 15 '22 21:11

Keppil