Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nutch in Windows: Failed to set permissions of path

I'm trying to user Solr with Nutch on a Windows Machine and I'm getting the following error:

Exception in thread "main" java.io.IOException: Failed to set permissions of path: c:\temp\mapred\staging\admin-1654213299\.staging to 0700

From a lot of threads I learned, that hadoop which seems to be used by nutch does some chmod magic that will work on Unix machines, but not on Windows.

This problem exists for more than a year now. I found one thread, where the code line is shown and a fix proposed. Am I really them only one who has this problem? Are all others creating a custom build in order to run nutch on windows? Or is there some option to disable the hadoop stuff or another solution? Maybe another crawler than nutch?

Here's the stack trace of what I'm doing:

    admin@WIN-G1BPD00JH42 /cygdrive/c/solr/apache-nutch-1.6
    $ bin/nutch crawl urls -dir crawl -depth 3 -topN 5 -solr http://localhost:8080/solr-4.1.0
    cygpath: can't convert empty path
    crawl started in: crawl
    rootUrlDir = urls
    threads = 10
    depth = 3
    solrUrl=http://localhost:8080/solr-4.1.0
    topN = 5
    Injector: starting at 2013-03-03 17:43:15
    Injector: crawlDb: crawl/crawldb
    Injector: urlDir: urls
    Injector: Converting injected urls to crawl db entries.
    Exception in thread "main" java.io.IOException: Failed to set permissions of path:         c:\temp\mapred\staging\admin-1654213299\.staging to 0700
        at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
        at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
        at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
        at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
        at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Unknown Source)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
        at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
like image 370
Boris Crismancich Avatar asked Mar 03 '13 16:03

Boris Crismancich


3 Answers

It took me a while to get this working but here's the solution which works on nutch 1.7.

  1. Download Hadoop Core 0.20.2 from the maven repository
  2. Replace $NUTCH_HOME/lib/hadoop-core-1.2.0.jar with the downloaded file renaming it with the same name.

That should be it.

Explanation

This issue is caused by hadoop since it assumes you're running on unix and abides by the file permission rules. The issue was resolved in 2011 actually but nutch didn't update the hadoop version they use. The relevant fixes are here and here

like image 142
Diaa Avatar answered Nov 12 '22 10:11

Diaa


We are using Nutch too, but it is not supported for running on Windows, on Cygwin our 1.4 version had similar problems as you had, something like mapreduce too.

We solved it by using a vm (Virtual box) with Ubuntu and a shared directory between Windows and Linux, so we can develop and built on Windows and run Nutch (crawling) on Linux.

like image 42
jpee Avatar answered Nov 12 '22 11:11

jpee


I have Nutch running on windows, no custom build. It's a long time since I haven't used it though. But one thing that took me a while to catch, is that you need to run cygwin as a windows admin to get the necessary rights.

like image 36
Mille Bii Avatar answered Nov 12 '22 11:11

Mille Bii