How to avoid Spark executor from getting lost and yarn container killing it due to memory limit?

Tags:

I have the following code which fires hiveContext.sql() most of the time. My task is I want to create few tables and insert values into after processing for all hive table partition.

So I first fire show partitions and using its output in a for-loop, I call a few methods which creates the table (if it doesn't exist) and inserts into them using hiveContext.sql.

Now, we can't execute hiveContext in an executor, so I have to execute this in a for-loop in a driver program, and should run serially one by one. When I submit this Spark job in YARN cluster, almost all the time my executor gets lost because of shuffle not found exception.

Now this is happening because YARN is killing my executor because of memory overload. I don't understand why, as I have a very small data set for each hive partition, but still it causes YARN to kill my executor.

Will the following code do everything in parallel and try to accommodate all hive partition data in memory at the same time?

public static void main(String[] args) throws IOException {   
    SparkConf conf = new SparkConf(); 
    SparkContext sc = new SparkContext(conf); 
    HiveContext hc = new HiveContext(sc); 

    DataFrame partitionFrame = hiveContext.sql(" show partitions dbdata partition(date="2015-08-05")"); 
  
    Row[] rowArr = partitionFrame.collect(); 
    for(Row row : rowArr) { 
        String[] splitArr = row.getString(0).split("/"); 
        String server = splitArr[0].split("=")[1]; 
        String date =  splitArr[1].split("=")[1]; 
        String csvPath = "hdfs:///user/db/ext/"+server+".csv"; 
        if(fs.exists(new Path(csvPath))) { 
            hiveContext.sql("ADD FILE " + csvPath); 
        } 
        createInsertIntoTableABC(hc,entity, date); 
        createInsertIntoTableDEF(hc,entity, date); 
        createInsertIntoTableGHI(hc,entity,date); 
        createInsertIntoTableJKL(hc,entity, date); 
        createInsertIntoTableMNO(hc,entity,date); 
   } 
}

274

asked Aug 05 '15 18:08

Umesh K

1 Answers

Generally, you should always dig into logs to get the real exception out (at least in Spark 1.3.1).

tl;dr
safe config for Spark under Yarn
spark.shuffle.memoryFraction=0.5 - this would allow shuffle use more of allocated memory
spark.yarn.executor.memoryOverhead=1024 - this is set in MB. Yarn kills executors when its memory usage is larger then (executor-memory + executor.memoryOverhead)

Little more info

From reading your question you mention that you get shuffle not found exception.

In case of org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle you should increase spark.shuffle.memoryFraction, for example to 0.5

Most common reason for Yarn killing off my executors was memory usage beyond what it expected. To avoid that you increase spark.yarn.executor.memoryOverhead , I've set it to 1024, even if my executors use only 2-3G of memory.

answered Oct 21 '22 09:10

Barak1731475

Related questions
                            
                                Garbage Collection and JavaScript "delete": Is this overkill/obfuscation, or a good practice?
                            
                                Why Bitmap size is bigger in memory than on disk in Android?
                            
                                Redis - monitoring memory usage
                            
                                dynamic allocating array of arrays in C
                            
                                UIGraphicsGetImageFromCurrentImageContext memory leak with previews
                            
                                mmap problem, allocates huge amounts of memory
                            
                                How to avoid long chain of free's (or deletes) after every error check in C?
                            
                                Do ArrayLists that contain different types of objects use different amounts of memory?
                            
                                Composer update `The following exception is caused by a lack of memory and not having swap configured` error in vagrant
                            
                                C++ virtual function table memory cost
                            
                                Why is a boost optional reference not a wrapper around a T*?
                            
                                How to avoid OrderBy - memory usage problems
                            
                                Optimizing linear access to arrays with pre-fetching and cache in C
                            
                                Array allocation and access on the Java Virtual Machine and memory contention
                            
                                Can you use List<List<struct>> to get around the 2gb object limit?
                            
                                Why does host_statistics64() return inconsistent results?
                            
                                Visual Studio watch window see object's size/memory footprint
                            
                                Interpreting the verbose output of ptxas, part I
                            
                                mmap and memory usage
                            
                                Google Chrome Heap Snapshots (closure), (array), (system), (compiled code) under programmer control?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to avoid Spark executor from getting lost and yarn container killing it due to memory limit?

Tags:

memory

apache-spark

apache-spark-sql

hadoop-yarn

executors

Umesh K

People also ask

1 Answers

Barak1731475

Recent Activity

Donate For Us