Nutch in Windows: Failed to set permissions of path

Question

I'm trying to user Solr with Nutch on a Windows Machine and I'm getting the following error:

Exception in thread "main" java.io.IOException: Failed to set permissions of path: c:	emp\mapred\staging\admin-1654213299\.staging to 0700

From a lot of threads I learned, that hadoop which seems to be used by nutch does some chmod magic that will work on Unix machines, but not on Windows.

This problem exists for more than a year now. I found one thread, where the code line is shown and a fix proposed. Am I really them only one who has this problem? Are all others creating a custom build in order to run nutch on windows? Or is there some option to disable the hadoop stuff or another solution? Maybe another crawler than nutch?

Here's the stack trace of what I'm doing:

    admin@WIN-G1BPD00JH42 /cygdrive/c/solr/apache-nutch-1.6
    $ bin/nutch crawl urls -dir crawl -depth 3 -topN 5 -solr http://localhost:8080/solr-4.1.0
    cygpath: can't convert empty path
    crawl started in: crawl
    rootUrlDir = urls
    threads = 10
    depth = 3
    solrUrl=http://localhost:8080/solr-4.1.0
    topN = 5
    Injector: starting at 2013-03-03 17:43:15
    Injector: crawlDb: crawl/crawldb
    Injector: urlDir: urls
    Injector: Converting injected urls to crawl db entries.
    Exception in thread "main" java.io.IOException: Failed to set permissions of path:         c:	emp\mapred\staging\admin-1654213299\.staging to 0700
        at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
        at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
        at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
        at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
        at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Unknown Source)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
        at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)

Diaa · Accepted Answer

It took me a while to get this working but here's the solution which works on nutch 1.7.

Download Hadoop Core 0.20.2 from the maven repository
Replace $NUTCH_HOME/lib/hadoop-core-1.2.0.jar with the downloaded file renaming it with the same name.

That should be it.

Explanation

This issue is caused by hadoop since it assumes you're running on unix and abides by the file permission rules. The issue was resolved in 2011 actually but nutch didn't update the hadoop version they use. The relevant fixes are here and here

jpee · Answer

We are using Nutch too, but it is not supported for running on Windows, on Cygwin our 1.4 version had similar problems as you had, something like mapreduce too.

We solved it by using a vm (Virtual box) with Ubuntu and a shared directory between Windows and Linux, so we can develop and built on Windows and run Nutch (crawling) on Linux.

Mille Bii · Answer

I have Nutch running on windows, no custom build. It's a long time since I haven't used it though. But one thing that took me a while to catch, is that you need to run cygwin as a windows admin to get the necessary rights.

Nutch in Windows: Failed to set permissions of path

Tags:

windows

solr

cygwin

hadoop

nutch

Boris Crismancich

3 Answers

Diaa

jpee

Mille Bii

Recent Activity

Donate For Us

Nutch in Windows: Failed to set permissions of path

Tags:

windows

solr

cygwin

hadoop

nutch

Boris Crismancich

3 Answers

Diaa

jpee

Mille Bii

Related questions

Recent Activity

Donate For Us