Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Regex lookahead takes too much time

I'm trying to create a proper regex for my problem and apparently ran into weird issue.

Let me describe what I'm trying to do..

My goal is to remove commas from both ends of the string. E,g, string , ,, ,,, , , Hello, my lovely, world, ,, , should become just Hello, my lovely, world.

I have prepared following regex to accomplish this: (\w+,*? *?)+(?=(,?\W+$))

It works like a charm in regex validators, but when I'm trying to run it on Android device, matcher.find() function hangs for ~1min to find a proper match... I assume, the problem is in positive lookahead I'm using, but I couldn't find any better solution than just trim commas separately from the beginning and at the end:

output = input.replaceAll("^(,?\\W?)+", ""); //replace commas at the beginning
output = output.replaceAll("(,?\\W?)+$", ""); //replace commas at the end

Is there something I am missing in positive lookahead in Java regex? How can I retrieve string section between commas at the beginning and at the end?

like image 386
Pavel Dudka Avatar asked Oct 09 '12 00:10

Pavel Dudka


1 Answers

You don't have to use a lookahead if you use matching groups. Try regex ^[\s,]*(.+?)[\s,]*$:

EDIT: To break it apart, ^ matches the beginning of the line, which is technically redundant if using matches() but may be useful elsewhere. [\s,]* matches zero or more whitespace characters or commas, but greedily--it will accept as many characters as possible. (.+?) matches any string of characters, but the trailing question mark instructs it to match as few characters as possible (non-greedy), and also capture the contents to "group 1" as it forms the first set of parentheses. The non-greedy match allows the final group to contain the same zero-or-more commas or whitespaces ([\s,]*). Like the ^, the final $ matches the end of the line--useful for find() but redundant for matches().

If you need it to match spaces only, replace [\s,] with [ ,].

This should work:

Pattern pattern = Pattern.compile("^[\\s,]*(.+?)[\\s,]*$");
Matcher matcher = pattern.matcher(", ,, ,,, , , Hello, my lovely, world, ,, ,");
if (!matcher.matches())
    return null;
return matcher.group(1); // "Hello, my lovely, world"
like image 73
Jeff Bowman Avatar answered Sep 27 '22 20:09

Jeff Bowman