Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pig local mode, group, or join = java.lang.OutOfMemoryError: Java heap space

Tags:

apache-pig

Using Apache Pig version 0.10.1.21 (reported), CentOS release 6.3 (Final), jdk1.6.0_31 (The Hortonworks Sandbox v1.2 on Virtualbox, with 3.5 GB RAM)

$ cat data.txt
11,11,22
33,34,35
47,0,21
33,6,51
56,6,11
11,25,67

$ cat GrpTest.pig
A = LOAD 'data.txt' USING PigStorage(',') AS (f1:int,f2:int,f3:int);
B = GROUP A BY f1;
DESCRIBE B;
DUMP B;

pig -x local GrpTest.pig

[Thread-12] WARN  org.apache.hadoop.mapred.JobClient - No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
[Thread-12] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
[Thread-13] INFO  org.apache.hadoop.mapred.Task -  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@19a9bea3
[Thread-13] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
[Thread-13] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
[main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias B

The java.lang.OutOfMemoryError: Java heap space error occurs each time I use GROUP or JOIN in a pig script executed in local mode. There is no error when the script is executed in mapreduce mode on HDFS.

Question 1: How come there is an OutOfMemory error while the data sample is minuscule and local mode is supposed to use less resources than HDFS mode?

Question 2: Is there a solution to run successfully a small pig scripts with GROUP or JOIN in local mode?

like image 658
Polymerase Avatar asked May 11 '13 16:05

Polymerase


People also ask

How do I stop Outofmemory error in Java?

Prevention: If MaxMetaSpaceSize, has been set on the command line, increase its value. MetaSpace is allocated from the same address spaces as the Java heap. Reducing the size of the Java heap will make more space available for MetaSpace.

What causes Java Lang OutOfMemoryError Java heap space?

OutOfMemoryError is a runtime error in Java which occurs when the Java Virtual Machine (JVM) is unable to allocate an object due to insufficient space in the Java heap. The Java Garbage Collector (GC) cannot free up the space required for a new object, which causes a java. lang. OutOfMemoryError .

How do I run a pig file in local mode?

Local Mode - To run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and file system. Specify local mode using the -x flag (pig -x local). Tez Local Mode - To run Pig in tez local mode.


1 Answers

Solution: force pig to allocate less memory for the java property io.sort.mb I set to 10 MB here and the error disappears. Not sure what would be the best value but at least, this allow to practice pig syntax in local mode

$ cat GrpTest.pig
--avoid java.lang.OutOfMemoryError: Java heap space (execmode: -x local)
set io.sort.mb 10;

A = LOAD 'data.txt' USING PigStorage(',') AS (f1:int,f2:int,f3:int);
B = GROUP A BY f1;
DESCRIBE B;
DUMP B;
like image 165
Polymerase Avatar answered Sep 16 '22 20:09

Polymerase