Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

could to find or load main class org.apache.nutch.crawl.InjectorJob

Tags:

solr

hadoop

nutch

I'm using Linux with Hadoop, Cloudera and HBase.

Could you tell me how to correct this error?

Error: could to find or load main class org.apache.nutch.crawl.InjectorJob

The following command gave me the error:

src/bin/nutch inject crawl/crawldb dmoz/

if you need any other information ask for me.

like image 810
orilion Avatar asked Mar 09 '15 09:03

orilion


People also ask

What is Nutch search engine project?

1.1 What is Nutch? Nutch is an effort to build a Free and Open Source search engine. It uses Lucene for the search and index component. The fetcher (robot) has been written from scratch solely for this project.

Is Apache Nutch open source?

Apache Nutch is a highly extensible and scalable open source web crawler software project.

What is nutch SOLR?

Nutch is an open source crawler which provides the Java library for crawling, indexing and database storage. Solr is an open source search platform which provides full-text search and integration with Nutch. The following contents are steps of setting up Nutch and Solr for crawling and searching.


1 Answers

I think you probably missed a step or two. Please confirm:

  1. Did you install Apache ANT and then navigate to the nutch folder and type in "ant"?
  2. Did you set the environment variables:
    • NUTCH_JAVA_HOME: The java implementation to use. Overrides JAVA_HOME.
    • NUTCH_HEAPSIZE: The maximum amount of heap to use, in MB. Default is 1000.
    • NUTCH_OPTS: Extra Java runtime options.Multiple options must be separated by white space.
    • NUTCH_LOG_DIR: Log directory (default: $NUTCH_HOME/logs)
    • NUTCH_LOGFILE: Log file (default: hadoop.log)
    • NUTCH_CONF_DIR: Path(s) to configuration files (default: $NUTCH_HOME/conf). Multiple paths must be separated by a colon ':'.
    • JAVA_HOME
    • NUTCH_JAVA_HOME
    • NUTCH_HOME

If you install using "ant", then you will get a new folder in /nutch called /nutch/runtime/local and this is from where you must actually run nutch.

Tip: Try reading this page.

like image 85
coderama Avatar answered Oct 25 '22 00:10

coderama