Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search Engine in Java?

  1. I am trying to create a search engine just to learn and get more experience in Java.

    My intention is to store about 100 files on a server, a mixture of html, xml, doc, txt, and for each file to have meta data.

    SO when i search for a keyword, it should display a file with its meta description like Google.

    My question is, apart from html, can you add meta data to any other file formats, so that the meta description is shown.

  2. Would you be able to point me towards a Java search engine, that can search within file formats (txt,html) and display the result.

    I am working on my own code for this, but would like to have a look at other peoples code for some help?

like image 708
lana Avatar asked Oct 28 '11 14:10

lana


2 Answers

Lucene is the canonical Java search engine.

For adding documents from a variety of sources, take a look at Apache Tika and for a full-blown system with service/web interfaces, solr.

Lucene allows arbitrary metadata to be associated with its documents. Tika will automatically cull metadata from a variety of formats.

like image 159
Dave Newton Avatar answered Nov 09 '22 05:11

Dave Newton


1)My question is apart from html can you add meta data to any other file formats, so that the meta description is shown.

In general you would use a database and store the metadata along with the document there. You'd then do a keyword search using a database query (possibly using SQL like or ilike).

The files might either be stored on the harddrive with just paths in the DB or put into the database as either CLOB or BLOB, depending on whether you have text or binary documents.

2) Would you be able to point be towards a Java search engine, that can search within file formats (txt,html) and displays the result.

Try Apache Lucene.

like image 45
Thomas Avatar answered Nov 09 '22 07:11

Thomas