I'm using Linux with Hadoop, Cloudera and HBase.
Could you tell me how to correct this error?
Error: could to find or load main class org.apache.nutch.crawl.InjectorJob
The following command gave me the error:
src/bin/nutch inject crawl/crawldb dmoz/
if you need any other information ask for me.
1.1 What is Nutch? Nutch is an effort to build a Free and Open Source search engine. It uses Lucene for the search and index component. The fetcher (robot) has been written from scratch solely for this project.
Apache Nutch is a highly extensible and scalable open source web crawler software project.
Nutch is an open source crawler which provides the Java library for crawling, indexing and database storage. Solr is an open source search platform which provides full-text search and integration with Nutch. The following contents are steps of setting up Nutch and Solr for crawling and searching.
I think you probably missed a step or two. Please confirm:
JAVA_HOME
.(default: $NUTCH_HOME/logs)
(default: hadoop.log)
(default: $NUTCH_HOME/conf)
. Multiple paths must be separated by a colon ':'.If you install using "ant", then you will get a new folder in /nutch called /nutch/runtime/local
and this is from where you must actually run nutch.
Tip: Try reading this page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With