Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Steps to reduce time delay due to GC allocation failure in azure databricks

Tags:

I'm executing a print "Hello World" Job in azure databricks python notebook on a spark cluster. Every time the Job is run it takes more than 12 seconds to execute which is expected to take less than 12 secs as it's the simplest python code anyone can think of. When I verify the logs it shows GC Allocation failure as follows:

2019-02-15T15:47:27.551+0000: [GC (Allocation Failure) [PSYoungGen: 312512K->57563K(390144K)] 498744K->243803K(1409024K), 0.0153696 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] 
2019-02-15T15:47:28.703+0000: [GC (Metadata GC Threshold) [PSYoungGen: 206668K->65267K(385024K)] 392909K->251515K(1403904K), 0.0187692 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] 
2019-02-15T15:47:28.722+0000: [Full GC (Metadata GC Threshold) [PSYoungGen: 65267K->0K(385024K)] [ParOldGen: 186248K->244119K(1018880K)] 251515K->244119K(1403904K), [Metaspace: 110436K->110307K(1144832K)], 0.3198827 secs] [Times: user=0.64 sys=0.04, real=0.32 secs] 

Wanted to know is the Job delay > 12 secs due to GC allocation failure? If yes, how can I reduce it? If not, what can be the other reason for the delay and how to correct it?

like image 661
user39602 Avatar asked Mar 06 '19 05:03

user39602


1 Answers

There is an overhead of starting a Spark Job on a cluster. If processing petabytes then the overhead is small but here it is noticable. The GC is not an issue here.

like image 107
thebluephantom Avatar answered Nov 15 '22 04:11

thebluephantom