I've tried upgrading to Apache Spark 1.6.0 RC3. My application now spams these errors for nearly every task:
Managed memory leak detected; size = 15735058 bytes, TID = 830
I've set logging level for org.apache.spark.memory.TaskMemoryManager
to DEBUG
and see in the logs:
I2015-12-18 16:54:41,125 TaskSetManager: Starting task 0.0 in stage 7.0 (TID 6, localhost, partition 0,NODE_LOCAL, 3026 bytes)
I2015-12-18 16:54:41,125 Executor: Running task 0.0 in stage 7.0 (TID 6)
I2015-12-18 16:54:41,130 ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
I2015-12-18 16:54:41,130 ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
D2015-12-18 16:54:41,188 TaskMemoryManager: Task 6 acquire 5.0 MB for null
I2015-12-18 16:54:41,199 ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
I2015-12-18 16:54:41,199 ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
D2015-12-18 16:54:41,262 TaskMemoryManager: Task 6 acquire 5.0 MB for null
D2015-12-18 16:54:41,397 TaskMemoryManager: Task 6 release 5.0 MB from null
E2015-12-18 16:54:41,398 Executor: Managed memory leak detected; size = 5245464 bytes, TID = 6
How do you debug these errors? Is there a way to log stack traces for allocations and deallocations, so I can find what leaks?
I don't know much about the new unified memory manager (SPARK-10000). Is the leak likely my fault or is it likely a Spark bug?
The short answer is that users are not supposed to see this message. Users are not supposed to be able to create memory leaks in the unified memory manager.
That such leaks happen is a Spark bug: SPARK-11293
But if you want to understand the cause of a memory leak, this is how I did it.
TaskMemoryManager.java
add extra logging in acquireExecutionMemory
and releaseExecutionMemory
: logger.error("stack trace:", new Exception());
TaskMemoryManager.java
. (Easier than figuring out logging configurations...)Now you will see the full stack trace for all allocations and deallocations. Try to match them up and find the allocations without deallocations. You now have the stack trace for the source of the leak.
I found this warn message too, but it was cause by "df.repartition(rePartNum,df("id"))". My df is empty, and the warn message's line equals to rePartNum. version: spark2.4 win10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With