Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pattern Matcher Vs String Split, which should I use?

First time posting.

Firstly I know how to use both Pattern Matcher & String Split. My questions is which is best for me to use in my example and why? Or suggestions for better alternatives.

Task: I need to extract an unknown NOUN between two known regexp in an unknown string.

My Solution: get the Start and End of the noun (from Regexp 1&2) and substring to extract the noun.

String line = "unknownXoooXNOUNXccccccXunknown";
int goal = 12 ;
String regexp1 = "Xo+X";
String regexp2 = "Xc+X";
  1. I need to locate the index position AFTER the first regex.
  2. I need to locate the index position BEFORE the second regex.

A) I can use pattern matcher

    Pattern p = Pattern.compile(regexp1);
    Matcher m = p.matcher(line);
    if (m.find()) {
        int afterRegex1 = m.end();
    } else {
        throw new IllegalArgumentException();
        //TODO Exception Management;
    }

B) I can use String Split

    String[] split = line.split(regex1,2);
    if (split.length != 2) {
        throw new UnsupportedOperationException();
        //TODO Exception Management;
    }
    int afterRegex1 = line.indexOf(split[1]);

Which Approach should I use and why? I don't know which is more efficient on time and memory. Both are near enough as readable to myself.

like image 884
Another Compiler Error Avatar asked Oct 16 '13 17:10

Another Compiler Error


People also ask

Is regex faster than string split?

Regex will work faster in execution, however Regex's compile time and setup time will be more in instance creation. But if you keep your regex object ready in the beginning, reusing same regex to do split will be faster. String.

Is pattern matcher thread safe?

Matcher class Like the Pattern class, Matcher defines no public constructors. You obtain a Matcher object by invoking the matcher() method on a Pattern object. The Instances of this class are not safe for use by multiple concurrent threads.

What does pattern matcher do?

The matcher() method is used to search for the pattern in a string. It returns a Matcher object which contains information about the search that was performed. The find() method returns true if the pattern was found in the string and false if it was not found.

Does Split use regex?

Split(String) Splits an input string into an array of substrings at the positions defined by a regular expression pattern specified in the Regex constructor.


2 Answers

I'd do it like this:

String line = "unknownXoooXNOUNXccccccXunknown";
String regex = "Xo+X(.*?)Xc+X";

Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(line);
if (m.find()) {
   String noun = m.group(1);
}

The (.*?) is used to make the inner match on the NOUN reluctant. This protects us from a case where our ending pattern appears again in the unknown portion of the string.

EDIT

This works because the (.*?) defines a capture group. There's only one such group defined in the pattern, so it gets index 1 (the parameter to m.group(1)). These groups are indexed from left to right starting at 1. If the pattern were defined like this

String regex = "(Xo+X)(.*?)(Xc+X)";

Then there would be three capture groups, such that

m.group(1); // yields "XoooX"
m.group(2); // yields "NOUN"
m.group(3); // yields "XccccccX"

There is a group 0, but that matches the whole pattern, and it's equivalent to this

m.group(); // yields "XoooXNOUNXccccccX"

For more information about what you can do with the Matcher, including ways to get the start and end positions of your pattern within the source string, see the Matcher JavaDocs

like image 133
Ian McLaird Avatar answered Sep 19 '22 19:09

Ian McLaird


You should use String.split() for readability unless you're in a tight loop.

Per split()'s javadoc, split() does the equivalent of Pattern.compile(), which you can optimize away if you're in a tight loop.

like image 21
willkil Avatar answered Sep 20 '22 19:09

willkil