Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When would it be worth using RegEx in Java?

Tags:

java

regex

I'm writing a small app that reads some input and do something based on that input.

Currently I'm looking for a line that ends with, say, "magic", I would use String's endsWith method. It's pretty clear to whoever reads my code what's going on.

Another way to do it is create a Pattern and try to match a line that ends with "magic". This is also clear, but I personally think this is an overkill because the pattern I'm looking for is not complex at all.

When do you think it's worth using RegEx Java? If it's complexity, how would you personally define what's complex enough?

Also, are there times when using Patterns are actually faster than string manipulation?

EDIT: I'm using Java 6.

like image 763
Russell Avatar asked Dec 05 '22 00:12

Russell


1 Answers

Basically: if there is a non-regex operation that does what you want in one step, always go for that.

This is not so much about performance, but about a) readability and b) compile-time-safety. Specialized non-regex versions are usually a lot easier to read than regex-versions. And a typo in one of these specialized methods will not compile, while a typo in a Regex will fail miserably at runtime.

Comparing Regex-based solutions to non-Regex-bases solutions

String s = "Magic_Carpet_Ride";

s.startsWith("Magic");   // non-regex
s.matches("Magic.*");    // regex

s.contains("Carpet");    // non-regex
s.matches(".*Carpet.*"); // regex

s.endsWith("Ride");      // non-regex
s.matches(".*Ride");     // regex

In all these cases it's a No-brainer: use the non-regex version.

But when things get a bit more complicated, it depends. I guess I'd still stick with non-regex in the following case, but many wouldn't:

// Test whether a string ends with "magic" in any case,
// followed by optional white space
s.toLowerCase().trim().endsWith("magic"); // non-regex, 3 calls
s.matches(".*(?i:magic)\\s*");            // regex, 1 call, but ugly

And in response to RegexesCanCertainlyBeEasierToReadThanMultipleFunctionCallsToDoTheSameThing:

I still think the non-regex version is more readable, but I would write it like this:

s.toLowerCase()
 .trim()
 .endsWith("magic");

Makes the whole difference, doesn't it?

like image 183
Sean Patrick Floyd Avatar answered Dec 09 '22 15:12

Sean Patrick Floyd