I am working on about 1GB incremental file and I want to search for a particular pattern. Currently I am using Java Regular expressions, do you have any idea how can I do this faster?
Summary: find and in depend on string length and location of pattern in the string while regex is somehow string-length independent and faster for very long strings with the pattern at the end.
Regex is faster for large string than an if (perhaps in a for loops) to check if anything matches your requirement.
Regular Expressions are efficient in that one line of code can save you writing hundreds of lines. But they're normally slower (even pre-compiled) than thoughtful hand written code simply due to the overhead. Generally the simpler the objective the worse Regular Expressions are. They're better for complex operations.
Sounds like a job for Apache Lucene.
You probably will have to rethink your searching strategy, but this library is made for doing things like this and adding indexes incrementally.
It works by building reverse indexes of your data (documents in Lucene parlance), and then quickly checking in the reverse indexes for which documents have parts of your pattern.
You can store metadata with the document indexes so you might able to not having to consult the big file in the majority of use-cases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With