Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problem starting tasktracker in hadoop under windows

I am trying to use hadoop under windows and I am running into a problem when I want to start tasktracker. For example:

$bin/start-all.sh

then the logs writes:

2011-06-08 16:32:18,157 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Failed to set permissions of path: /tmp/hadoop-Administrator/mapred/local/taskTracker to 0755
    at org.apache.hadoop.fs.RawLocalFileSystem.checkReturnValue(RawLocalFileSystem.java:525)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:507)
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318)
    at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183)
    at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:630)
    at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1328)
    at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3430)

What's the problem? How can I solve this? Thanks!

like image 455
Charlie Epps Avatar asked Jun 08 '11 09:06

Charlie Epps


People also ask

What is TaskTracker in Hadoop?

The task tracker is the one that actually runs the task on the data node. Job tracker will pass the information to the task tracker and the task tracker will run the job on the data node. Once the job has been assigned to the task tracker, there is a heartbeat associated with each task tracker and job tracker.

What is the role of JobTracker and TaskTracker in MapReduce?

The JobTracker is the service within Hadoop that farms out MapReduce tasks to specific nodes in the cluster, ideally the nodes that have the data, or at least are in the same rack. Client applications submit jobs to the Job tracker. The JobTracker submits the work to the chosen TaskTracker nodes.


2 Answers

I was running into this issue on an installation of 1.0.3 on Windows server. I changed the default directory in hdfs-site.xml so that the directory that hadoop creates for the dfs is a subdir of the cygwin directory like this...

...

 <property>
    <name>dfs.name.dir</name>
    <value>c:/cygwin/usr/mydir/dfs/logs</value>
 </property>
 <property>
    <name>dfs.data.dir</name>
    <value>c:/cygwin/usr/mydir/dfs/data</value>
 </property>
</configuration>

This seemed to resolve the problem.

The apache documentation for the config files is here

like image 68
BRM Avatar answered Sep 28 '22 17:09

BRM


This issue is being tracked at https://issues.apache.org/jira/browse/HADOOP-7682

like image 32
Dave L. Avatar answered Sep 28 '22 17:09

Dave L.