I am getting memory leak warning which ideally was a Spark bug back till 1.6 version and was resolved.
Mode: Standalone IDE: PyCharm Spark version: 2.3 Python version: 3.6
Below is the stack trace -
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3148
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3152
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3151
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3150
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3149
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3153
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3154
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3158
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3155
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3157
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3160
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3161
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3156
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3159
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3165
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3163
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3162
2018-05-25 15:00:05 WARN Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3166
Any insight on why it may happen? Though my job is successfully getting accomplished.
Edit: Many said it is duplicate of the 2 years old question, but the answer there says it was a Spark bug but when checked in Spark's Jira, it says it is resolved.
Question here is, so many versions later, why am I still getting the same in Spark 2.3? I'll surely remove the question if it seems really redundant with some valid or logical answer to my query.
You can resolve it by setting the partition size: increase the value of spark. sql. shuffle. partitions.
The primary tools for detecting memory leaks are the C/C++ debugger and the C Run-time Library (CRT) debug heap functions. The #define statement maps a base version of the CRT heap functions to the corresponding debug version. If you leave out the #define statement, the memory leak dump will be less detailed.
Memory leak in the application and application master gets killed and runs again after running out of the XmX and eventually gets killed. The spark application as the RSS memory of the process keeps on growing very slowly and gets killed by NM eventually.
Execution memory - this memory is for storing data required during execution spark tasks; User memory - this memory is for user purposes. You can store here your custom data structure, UDFs, UDAFs, etc; Reserved memory - this memory is for spark purposes and it hardcoded to 300MB as of spark 1.6.
According to SPARK-14168, the warning stems from not consuming an entire iterator. I have encountered the same error when taking n elements from an RDD in Spark shell.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With