Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Nutch - Problems with Paths

Tags:

java

apache

nutch

I am trying to set up Apache Nutch to crawl URLs, following this guide. Being an older guide (The guide is for 1.x, I am using 2.3), I have made the necessary changes to structure. However, when I try to run a crawl, I get this error :

root@IndiStage:~# /usr/local/nutch/framework/apache-nutch-2.3/src/bin/crawl urls FirstCrawl 2
No SOLRURL specified. Skipping indexing.
Injecting seed URLs
/usr/local/nutch/framework/apache-nutch-2.3/src/bin/nutch inject urls -crawlId FirstCrawl
Error: Could not find or load main class org.apache.nutch.crawl.InjectorJob
Error running:
  /usr/local/nutch/framework/apache-nutch-2.3/src/bin/nutch inject urls -crawlId FirstCrawl
Failed with exit value 1.
root@IndiStage:~#

Being new to Ubuntu (14.04), I am finding it hard to manage the directory structure and paths here.

InjectorJob is in /usr/local/nutch/framework/apache-nutch-2.3/src/java/org/apache/nutch/crawl

JAVA_HOME is set to /usr/lib/jvm/java-7-openjdk-amd64

like image 272
Sainath Krishnan Avatar asked Nov 15 '15 08:11

Sainath Krishnan


1 Answers

Make sure that you already compile the Nutch source code. Then, run the crawl command from ${APACHE_NUTCH_HOME}/runtime/local (or ${APACHE_NUTCH_HOME}/runtime/deploy/bin).

Hope this helps,

Le Quoc Do

like image 87
Do Do Avatar answered Sep 21 '22 06:09

Do Do