I need to do a lot of searches of certain patterns in source files while the user is changing them, so I need to do regexp matching that is efficient in time and memory. The pattern repeats itself so should be compiled once, but I need to be able to retrieve subparts (rather than just confirm a match)
I'm considering using java.util.regexp or the Jakarta perl5util (if it still exists, been a few years since I used it), or perhaps the Eclipse search engine though I doubt that ti's smarter.
Is there any significant performance difference between the two?
I am not sure there is a huge performance gap in term of the different regexp java engines.
But there sure is a performance issue when constructing a regexp (and that is, if the data is large enough, as noted by Jeff Atwood)
The only thing you should avoid is catastrophic backtracking, better avoided when using atomic grouping.
So, by default I would use the java.utils.regexp engine, unless you have specific perl-compliant sources of regexp you need to reuse in your program.
Then I would carefully construct the regexp I intend to use.
But in term of choosing one engine or another... as it has been said in many other questions...:
As VonC says, you need to know your regexps. It doesn't hurt to compile the Regexes beforehand OTHERWISE, the cost of compiling regex each time can hurt the performance badly.
For some categories, there are alternate libraries : http://jint.sourceforge.net/jint.html which might have better performance. Then again, it depends upon which version of java you're using.
JDK 1.6 shows the maturity of the regex engine with good features and performance combined.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With