Spark 2.3 Memory Leak on Executor

Tags:

I am getting memory leak warning which ideally was a Spark bug back till 1.6 version and was resolved.

Mode: Standalone IDE: PyCharm Spark version: 2.3 Python version: 3.6

Below is the stack trace -

2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3148
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3152
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3151
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3150
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3149
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3153
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3154
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3158
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3155
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3157
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3160
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3161
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3156
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3159
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3165
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3163
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3162
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3166

Any insight on why it may happen? Though my job is successfully getting accomplished.

Edit: Many said it is duplicate of the 2 years old question, but the answer there says it was a Spark bug but when checked in Spark's Jira, it says it is resolved.

Question here is, so many versions later, why am I still getting the same in Spark 2.3? I'll surely remove the question if it seems really redundant with some valid or logical answer to my query.

992

asked May 25 '18 09:05

Aakash Basu

1 Answers

According to SPARK-14168, the warning stems from not consuming an entire iterator. I have encountered the same error when taking n elements from an RDD in Spark shell.

answered Nov 15 '22 12:11

Josh

Related questions
                            
                                `pip freeze` breaks with package installation
                            
                                How to use PyCharm for GIMP plugin development?
                            
                                PySpark: Add a column to DataFrame when column is a list
                            
                                pytest with setup.py test
                            
                                Extracting one-hot vector from text
                            
                                How to live with both enum and enum34?
                            
                                Python multiprocessing - AssertionError: can only join a child process
                            
                                Process request thread error with Flask Application?
                            
                                Windows alternative to pexpect
                            
                                Performance between C-contiguous and Fortran-contiguous array operations
                            
                                Specifying Readonly access for Django.db connection object
                            
                                Out of bounds nanosecond timestamp
                            
                                Authentication only via config file?
                            
                                Tensorflow installation using SSE instructions with pip
                            
                                Show image without scaling
                            
                                How to include a shared C library in a Python package
                            
                                Structuring python projects without path hacks
                            
                                Annotate with django-graphene and filters
                            
                                Python subprocess in .exe
                            
                                Creating a dylib file on MacOS for use with Python wrapper of Steamworks API

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark 2.3 Memory Leak on Executor

Tags:

python

memory-leaks

python-3.x

apache-spark

pyspark

Aakash Basu

People also ask

1 Answers

Josh

Recent Activity

Donate For Us