I am trying to create a search engine just to learn and get more experience in Java.
My intention is to store about 100 files on a server, a mixture of html, xml, doc, txt, and for each file to have meta data.
SO when i search for a keyword, it should display a file with its meta description like Google.
My question is, apart from html, can you add meta data to any other file formats, so that the meta description is shown.
Would you be able to point me towards a Java search engine, that can search within file formats (txt,html) and display the result.
I am working on my own code for this, but would like to have a look at other peoples code for some help?
Lucene is the canonical Java search engine.
For adding documents from a variety of sources, take a look at Apache Tika and for a full-blown system with service/web interfaces, solr.
Lucene allows arbitrary metadata to be associated with its documents. Tika will automatically cull metadata from a variety of formats.
1)My question is apart from html can you add meta data to any other file formats, so that the meta description is shown.
In general you would use a database and store the metadata along with the document there. You'd then do a keyword search using a database query (possibly using SQL like or ilike).
The files might either be stored on the harddrive with just paths in the DB or put into the database as either CLOB or BLOB, depending on whether you have text or binary documents.
2) Would you be able to point be towards a Java search engine, that can search within file formats (txt,html) and displays the result.
Try Apache Lucene.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With