OutofMemoryError when reading a local file via DistributedCache

Question

UPDATE 11/21/2012:

Problem solved by setting the property mapred.child.java.opts to -Xmx512m. Before this I had set HADOOP_HEAPSIZE to 2000 in core-site.xml but this didn't help. I still don't understand why the program works locally but it doesn't distributedly. Thanks for all the answers.

I'm using Hadoop 1.0.3. The cluster is composed of three machines, all of them running Ubuntu Linux 12.04 LTS. Two of the machines have 12 GB of RAM and the third one has 4 GB. I'm reading a local file of about 40 MB via DistributedCache. My program works perfectly in a local environment (local/standalone mode). However, when I execute it in the Hadoop cluster (fully distributed mode), I get an "OutOfMemoryError: Java heap space", with the same 40 MB file. I don't understand why this happens, as the file isn't that large. This is the code:

    public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> {
    // ...
    private HashMap<String, String> urlTrad = new HashMap<String, String>();
    // ...
    @Override
    public void configure(JobConf job) {
        Path[] urlsFiles = new Path[0];
        BufferedReader fis;

        try {
            urlsFiles = DistributedCache.getLocalCacheFiles(job);
            fis = new BufferedReader(new FileReader(
                    urlsFiles[0].toString()));
            String pattern;
            while ((pattern = fis.readLine()) != null) {
                String[] parts = pattern.split("	");
                urlTrad.put(parts[0], parts[1]);
            }
            fis.close();

        } catch (IOException ioe) {
            System.err
                    .println("Caught exception while parsing the cached file '"
                    + urlsFiles[0]
                    + "' : "
                    + StringUtils.stringifyException(ioe));
        }
    }
    // ...

Any help would be appreciated, thanks in advance.

iporto · Accepted Answer

Problem solved by setting the property mapred.child.java.opts to -Xmx512m. Before this I had set HADOOP_HEAPSIZE to 2000 in core-site.xml but this didn't help. I still don't understand why the program works locally but it doesn't distributedly.

OutofMemoryError when reading a local file via DistributedCache

Tags:

out-of-memory

hadoop

mapreduce

iporto

1 Answers

iporto

Recent Activity

Donate For Us

OutofMemoryError when reading a local file via DistributedCache

Tags:

out-of-memory

hadoop

mapreduce

iporto

1 Answers

iporto

Related questions

Recent Activity

Donate For Us