Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OutofMemoryError when reading a local file via DistributedCache

UPDATE 11/21/2012:

Problem solved by setting the property mapred.child.java.opts to -Xmx512m. Before this I had set HADOOP_HEAPSIZE to 2000 in core-site.xml but this didn't help. I still don't understand why the program works locally but it doesn't distributedly. Thanks for all the answers.

I'm using Hadoop 1.0.3. The cluster is composed of three machines, all of them running Ubuntu Linux 12.04 LTS. Two of the machines have 12 GB of RAM and the third one has 4 GB. I'm reading a local file of about 40 MB via DistributedCache. My program works perfectly in a local environment (local/standalone mode). However, when I execute it in the Hadoop cluster (fully distributed mode), I get an "OutOfMemoryError: Java heap space", with the same 40 MB file. I don't understand why this happens, as the file isn't that large. This is the code:

    public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> {
    // ...
    private HashMap<String, String> urlTrad = new HashMap<String, String>();
    // ...
    @Override
    public void configure(JobConf job) {
        Path[] urlsFiles = new Path[0];
        BufferedReader fis;

        try {
            urlsFiles = DistributedCache.getLocalCacheFiles(job);
            fis = new BufferedReader(new FileReader(
                    urlsFiles[0].toString()));
            String pattern;
            while ((pattern = fis.readLine()) != null) {
                String[] parts = pattern.split("\t");
                urlTrad.put(parts[0], parts[1]);
            }
            fis.close();

        } catch (IOException ioe) {
            System.err
                    .println("Caught exception while parsing the cached file '"
                    + urlsFiles[0]
                    + "' : "
                    + StringUtils.stringifyException(ioe));
        }
    }
    // ...

Any help would be appreciated, thanks in advance.

like image 352
iporto Avatar asked Nov 13 '22 17:11

iporto


1 Answers

Problem solved by setting the property mapred.child.java.opts to -Xmx512m. Before this I had set HADOOP_HEAPSIZE to 2000 in core-site.xml but this didn't help. I still don't understand why the program works locally but it doesn't distributedly.

like image 117
iporto Avatar answered Nov 15 '22 08:11

iporto