Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I index a lot of txt files? (Java/C/C++)

I need to index a lot of text. The search results must give me the name of the files containing the query and all of the positions where the query matched in each file - so, I don't have to load the whole file to find the matching portion. What libraries can you recommend for doing this?

update: Lucene has been suggested. Can you give me some info on how should I use Lucene to achieve this? (I have seen examples where the search query returned only the matching files)

like image 836
George Avatar asked Dec 03 '22 08:12

George


2 Answers

For java try Lucene

like image 115
Jared Avatar answered Dec 04 '22 22:12

Jared


I believe the lucene term for what you are looking for is highlighting. Here is a very recent report on Lucene highlighting. You will probably need to store word position information in order to get the snippets you are looking for. The Token API may help.

like image 21
Yuval F Avatar answered Dec 04 '22 20:12

Yuval F