Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error while indexing in solr data crawled by nutch

I have starting working with nutch and solr and I have a problem with integrating Solr with Nutch. I followed this tutorial: http://wiki.apache.org/nutch/NutchTutorial and after using: bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5 nutch shows message:

java.io.IOException: Job failed!

and solr is showing:

SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=http://nutch.apache.org/] unknown field 'host'

I thought that the reason might be a missing 'host' field in the $SOLR_HOME/example/solr/conf/schema.xml but it is there. I would be very grateful for your help.

like image 932
user1831647 Avatar asked Nov 17 '12 09:11

user1831647


People also ask

What is nutch Solr?

Nutch is an open source crawler which provides the Java library for crawling, indexing and database storage. Solr is an open source search platform which provides full-text search and integration with Nutch. The following contents are steps of setting up Nutch and Solr for crawling and searching.

How is indexing done in Solr?

By adding content to an index, we make it searchable by Solr. A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a database, and files in common file formats such as Microsoft Word or PDF.

What does indexing in Solr mean?

Indexing enables users to locate information in a document. Indexing collects, parses, and stores documents. Indexing is done to increase the speed and performance of a search query while finding a required document.


1 Answers

Changing configuration at Nutch side does not effect the schema of Solr. You have to define that field at schema.xml of Solr.

like image 143
kamaci Avatar answered Oct 31 '22 03:10

kamaci