Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a good Java Library to use for searching through several files for a list of search terms? [closed]

Basically, what I would like to do is search through a folder its subfolders for a list of search terms. It does not have to be highly optimized or anything like that. I would like the library to be able to "Match Case," match "Whole Words Only," etc.

I think I could write something like this, opening each file in a file, and searching each word, etc, but I really want a short-cut. Is there some library that already does most of this?

My dream code would be something like:

ArrayList occurrences = SomeLibrary.parse("directoryPath","searchTerm");

Is there anything close to this high level?

Thanks, Grae

like image 568
GC_ Avatar asked Feb 11 '11 14:02

GC_


People also ask

Which library is used in Java?

The Java Class Library (JCL) is a set of dynamically loadable libraries that Java Virtual Machine (JVM) languages can call at run time. Because the Java Platform is not dependent on a specific operating system, applications cannot rely on any of the platform-native libraries.

How does library work in Java?

A Java library is just a collection of classes that have been written by somebody else already. You download those classes and tell your computer about them, and then you can use those classes in your code.

Does Java have a lot of libraries?

One of the key features of Java is that it has a feature-rich and vast Core library. While the Standard Java library is powerful, you will need other Java libraries in professional Software Development.


1 Answers

I would not recommend using Lucene (or Solr) for these requirements.

  1. First of all, there is no need for full-featured text search library that (to put it simply) does all kinds of magic to have very robust text search using all linguistic knowledge of stemming, grammar and syntax tricks.

  2. While Lucene is a powerful you cannot have everything with Lucene with out-of-box functionality. As an example, it is relatively easy to configure it to find apples with an "apple" term. Okay. But using the same configuration it will not find you "123" in "12345" string. And forget about "non-readable" texts like application logs. Lucene is a 'google' like engine, it searches texts for humans from human-readable proper texts. To address all sorts of "basic" string matches you will need to write a custom processing code that integrates with Lucene functionality and it is not simple any more.

With Java it is much simpler and quicker to write a BufferedReader scanner that recursively processes the files and folders and searches for exact or partial matches using String.match and String.contains operations.

like image 89
serg.nechaev Avatar answered Sep 21 '22 22:09

serg.nechaev