Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark 2.3 Memory Leak on Executor

I am getting memory leak warning which ideally was a Spark bug back till 1.6 version and was resolved.

Mode: Standalone IDE: PyCharm Spark version: 2.3 Python version: 3.6

Below is the stack trace -

2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3148
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3152
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3151
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3150
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3149
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3153
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3154
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3158
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3155
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3157
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3160
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3161
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3156
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3159
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3165
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3163
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3162
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3166

Any insight on why it may happen? Though my job is successfully getting accomplished.

Edit: Many said it is duplicate of the 2 years old question, but the answer there says it was a Spark bug but when checked in Spark's Jira, it says it is resolved.

Question here is, so many versions later, why am I still getting the same in Spark 2.3? I'll surely remove the question if it seems really redundant with some valid or logical answer to my query.

like image 992
Aakash Basu Avatar asked May 25 '18 09:05

Aakash Basu


People also ask

How do I resolve a memory problem in Spark?

You can resolve it by setting the partition size: increase the value of spark. sql. shuffle. partitions.

How do I check for memory leaks?

The primary tools for detecting memory leaks are the C/C++ debugger and the C Run-time Library (CRT) debug heap functions. The #define statement maps a base version of the CRT heap functions to the corresponding debug version. If you leave out the #define statement, the memory leak dump will be less detailed.

What is memory leak in Spark?

Memory leak in the application and application master gets killed and runs again after running out of the XmX and eventually gets killed. The spark application as the RSS memory of the process keeps on growing very slowly and gets killed by NM eventually.

What is Spark executor Pyspark memory?

Execution memory - this memory is for storing data required during execution spark tasks; User memory - this memory is for user purposes. You can store here your custom data structure, UDFs, UDAFs, etc; Reserved memory - this memory is for spark purposes and it hardcoded to 300MB as of spark 1.6.


1 Answers

According to SPARK-14168, the warning stems from not consuming an entire iterator. I have encountered the same error when taking n elements from an RDD in Spark shell.

like image 79
Josh Avatar answered Nov 15 '22 12:11

Josh