UPDATE 11/21/2012:
Problem solved by setting the property mapred.child.java.opts to -Xmx512m. Before this I had set HADOOP_HEAPSIZE to 2000 in core-site.xml but this didn't help. I still don't understand why the program works locally but it doesn't distributedly. Thanks for all the answers.
I'm using Hadoop 1.0.3. The cluster is composed of three machines, all of them running Ubuntu Linux 12.04 LTS. Two of the machines have 12 GB of RAM and the third one has 4 GB. I'm reading a local file of about 40 MB via DistributedCache. My program works perfectly in a local environment (local/standalone mode). However, when I execute it in the Hadoop cluster (fully distributed mode), I get an "OutOfMemoryError: Java heap space", with the same 40 MB file. I don't understand why this happens, as the file isn't that large. This is the code:
public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> {
// ...
private HashMap<String, String> urlTrad = new HashMap<String, String>();
// ...
@Override
public void configure(JobConf job) {
Path[] urlsFiles = new Path[0];
BufferedReader fis;
try {
urlsFiles = DistributedCache.getLocalCacheFiles(job);
fis = new BufferedReader(new FileReader(
urlsFiles[0].toString()));
String pattern;
while ((pattern = fis.readLine()) != null) {
String[] parts = pattern.split("\t");
urlTrad.put(parts[0], parts[1]);
}
fis.close();
} catch (IOException ioe) {
System.err
.println("Caught exception while parsing the cached file '"
+ urlsFiles[0]
+ "' : "
+ StringUtils.stringifyException(ioe));
}
}
// ...
Any help would be appreciated, thanks in advance.
Problem solved by setting the property mapred.child.java.opts to -Xmx512m. Before this I had set HADOOP_HEAPSIZE to 2000 in core-site.xml but this didn't help. I still don't understand why the program works locally but it doesn't distributedly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With