Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write data to local disk in each datanode

I want to store some value in map task into local disk in each data node. For example,

public void map (...) {
   //Process
   List<Object> cache = new ArrayList<Object>();
   //Add value to cache
   //Serialize cache to local file in this data node
}

How can I store this cache object to local disk in each data node, because if I store this cache in map function like above, then the performance will be terrible because I/O task?

I mean is there any way to wait for map task in this data node run completely and then we will store this cache into local disk? Or does Hadoop have a function to solve this issue?

like image 502
nd07 Avatar asked Jan 28 '26 22:01

nd07


1 Answers

Please see below example, the created file will be somewhere under the directories used by NodeManager for containers. This is configuration property yarn.nodemanager.local-dirs in yarn-site.xml, or the default inherited from yarn-default.xml, which is under /tmp

Please see @Chris Nauroth answer, Which says that Its just for debugging purpose and It's not recommended as a permanent production configuration. It was clearly described why it was not recommended.

public void map(Object key, Text value, Context context)
        throws IOException, InterruptedException {
    // do some hadoop stuff, like counting words
    String path = "newFile.txt";
    try {
        File f = new File(path);
        f.createNewFile();
    } catch (IOException e) {
        System.out.println("Message easy to look up in the logs.");
        System.err.println("Error easy to look up in the logs.");
        e.printStackTrace();
        throw e;
    }
}
like image 113
Ram Ghadiyaram Avatar answered Jan 31 '26 17:01

Ram Ghadiyaram