I was trying to build a 'site search' on a simple http site.
I have a site, lets call it www.mycompany.com, that is pure html.
Is there an easy way to use solr to index the entire site to build a full text search using solr as the engine?
I googled for a bit and could not find anything specific of the type: Do A Do B ... profit!
Let me also know if I am a bit off with what is solr for :P
Thanks in advance.
A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a database, and files in common file formats such as Microsoft Word or PDF.
Solr is a search server built on top of Apache Lucene, an open source, Java-based, information retrieval library. It is designed to drive powerful document retrieval applications - wherever you need to serve data to users based on their queries, Solr can work for you.
Solr works by gathering, storing and indexing documents from different sources and making them searchable in near real-time. It follows a 3-step process that involves indexing, querying, and finally, ranking the results – all in near real-time, even though it can work with huge volumes of data.
Solr is only for indexing and searching text, it does not have a crawler since it's out the project's scope.
However take a look at Nutch, which is a crawler and not too hard to setup initially.
Nutch and Solr can be integrated if you need some Solr-specific feature to search the index.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With