Given a String containing a comma delimited list representing a proper noun & category/description pair, what are the pros & cons of using String.split() versus Pattern & Matcher approach to find a particular proper noun and extract the associated category/description pair?
The haystack String format will not change. It will always contain comma delimited data in the form of PROPER_NOUN|CATEGORY/DESCRIPTION
Common variables for both approaches:
String haystack="EARTH|PLANET/COMFORTABLE,MARS|PLANET/HARDTOBREATHE,PLUTO|DWARF_PLANET/FARAWAY";
String needle="PLUTO";
String result=null;
Using String.split():
for (String current : haystack.split(","))
if (current.contains(needle))
{
result=current.split("\\|")[1]);
break; // *edit* Not part of original code - added in response to comment from Pshemo
{
Using Pattern & Matcher:
Pattern pattern = pattern.compile("(" +needle+ "\|)(\w+/\w+)");
Matcher matches = pattern.matcher(haystack);
if (matches.find())
result=matches.group(2);
Both approaches provide the information I require.
I'm wondering if any reason exists to choose one over the other. I am not currently using Pattern & Matcher within my project so this approach will require imports from java.util.regex
And, of course, if there is an objectively 'better' way to parse the information I will welcome your input.
Thank you for your time!
Conclusion
I've opted for the Pattern/Matcher approach. While a little tricky to read w/the regex, it is faster than .split()/.contains()/.split() and, more importantly to me, captures the first match only.
For what it is worth, here are the results of my imperfect benchmark tests, in nanoseconds, after 100,000 iterations:
.split()/.contains()/.split
304,212,973
Pattern/Matcher w/ Pattern.compile() invoked for each iteration
230,511,000
Pattern/Matcher w/Pattern.compile() invoked prior to iteration
111,545,646
In a small case such as this, it won't matter that much. However, if you have extremely large strings, it may be beneficial to use Pattern/Matcher directly.
Most string functions that use regular expressions (such as matches(), split(), replaceAll(), etc.) makes use of Matcher/Pattern directly. Thus it will create a Matcher object every time, causing inefficiency when used in a large loop.
Thus if you really want speed, you can use Matcher/Pattern directly and ideally only create a single Matcher object.
There are no advantages to using pattern/matcher in cases where the manipulation to be done is as simple as this.
You can look at String.split()
as a convenience method that leverages many of the same functionalities you use when you use a pattern/matcher directly.
When you need to do more complex matching/manipulation, use a pattern/matcher, but when String.split()
meets your needs, the obvious advantage to using it is that it reduces code complexity considerably - and I can think of no good reason to pass this advantage up.
I would say that the split()
version is much better here due to the following reasons:
split()
code is very clear, and it is easy to see what it does. The regex version demands much more analysis.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With