Dag-scheduler-event-loop java.lang.OutOfMemoryError: unable to create new native thread

Question

I get the following error from spark-driver program after running for 5-6 hours. I am using Ubuntu 16.04 LTS and open-jdk-8.

Exception in thread "ForkJoinPool-50-worker-11" Exception in thread "dag-scheduler-event-loop" Exception in thread "ForkJoinPool-50-worker-13" java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method)
    at java.lang.Thread.start(Thread.java:714)
    at scala.concurrent.forkjoin.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1672)
    at scala.concurrent.forkjoin.ForkJoinPool.deregisterWorker(ForkJoinPool.java:1795)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:117)
java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method)
    at java.lang.Thread.start(Thread.java:714)
    at scala.concurrent.forkjoin.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1672)
    at scala.concurrent.forkjoin.ForkJoinPool.signalWork(ForkJoinPool.java:1966)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.push(ForkJoinPool.java:1072)
    at scala.concurrent.forkjoin.ForkJoinTask.fork(ForkJoinTask.java:654)
    at scala.collection.parallel.ForkJoinTasks$WrappedTask$class.start(Tasks.scala:377)
    at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.start(Tasks.scala:443)
    at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$$anonfun$spawnSubtasks$1.apply(Tasks.scala:189)
    at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$$anonfun$spawnSubtasks$1.apply(Tasks.scala:186)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.spawnSubtasks(Tasks.scala:186)
    at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.spawnSubtasks(Tasks.scala:443)
    at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.internal(Tasks.scala:157)
    at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:443)
    at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:149)
    at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
    at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinTask.doJoin(ForkJoinTask.java:341)
    at scala.concurrent.forkjoin.ForkJoinTask.join(ForkJoinTask.java:673)
    at scala.collection.parallel.ForkJoinTasks$WrappedTask$class.sync(Tasks.scala:378)
    at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.sync(Tasks.scala:443)
    at scala.collection.parallel.ForkJoinTasks$class.executeAndWaitResult(Tasks.scala:426)
    at scala.collection.parallel.ForkJoinTaskSupport.executeAndWaitResult(TaskSupport.scala:56)
    at scala.collection.parallel.ParIterableLike$ResultMapping.leaf(ParIterableLike.scala:958)
    at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
    at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
    at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
    at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
    at scala.collection.parallel.ParIterableLike$ResultMapping.tryLeaf(ParIterableLike.scala:953)
    at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:152)
    at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
    at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method)
    at java.lang.Thread.start(Thread.java:714)
    at scala.concurrent.forkjoin.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1672)
    at scala.concurrent.forkjoin.ForkJoinPool.deregisterWorker(ForkJoinPool.java:1795)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:117)

This is the error produced by the Spark Driver program which is running on client mode by default so some people say just increase the heap size by passing the --driver-memory 3g flag or something however the message "unable to create new native thread" really says that the JVM is asking OS to create a new thread but OS couldn't allocate it anymore and the number of threads a JVM can create by requesting OS is platform dependent but typically it is 32K threads on a 64-bit OS & JVM.

when I did ulimit -a I get the following

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 120242
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 120242
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

cat /proc/sys/kernel/pid_max

cat /proc/sys/kernel/threads-max

"unable to create new native thread" clearly means it has nothing to do with heap. so I believe this is more of a OS issue.

user1870400 · Accepted Answer

There seems to be a bug in the usage of ForkJoinPool in Spark 2.0.0 which is creating way too many threads. Specifically in the UnionRDD.scala which is used when you are calling a window operation on Dstream.

https://issues.apache.org/jira/browse/SPARK-17396 so according to this ticket I upgraded to 2.0.1 and it fixed the issue.

SkyWalker · Answer

In Java you can stumble upon two kind of Out of Memory errors:

The java.lang.OutOfMemoryError Java heap space error : This exception will be triggered when the application attempts to allocate more data into the heap space area, but there is not enough room for it. Although there might be plenty of memory available on your machine, you have hit the maximum amount of memory allowed by your JVM, which can be set through the -Xmx parameter
The java.lang.OutOfMemoryError: Unable to create new native thread happens whenever the JVM asks for a new thread from the OS. If the underlying OS cannot allocate a new native thread, this OutOfMemoryError will be thrown.

1) Check Threads system wide settings

The /proc/sys/kernel/threads-max file provides a system-wide limit for the number of threads. The root user can change that value if they wish to:

$ echo 100000 > /proc/sys/kernel/threads-max

You can check the current number of running threads through the /proc/loadavg filesystem:

$ cat /proc/loadavg
0.41 0.45 0.57 3/749 28174

Watch the fourth field! This field consists of two numbers separated by a slash (/). The first of these is the number of currently executing kernel scheduling entities (processes, threads); this will be less than or equal to the number of CPUs. The value after the slash is the number of kernel scheduling entities that currently exist on the system. In this case you are running 749 threads

2) Check number of processes per user

On a Linux box, threads are essentially just processes with a shared address space. Therefore, you have to check if your OS allows you enough processes for user. This can be checked through:

ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 515005
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

The default number of process per users is 1024 by default. At this point we will count the number of processes running. The number of processes running can be counted with a ps output:

$ ps -elf | wc -l  
220

This number however does not consider the threads which can be spawned by a process. If you try to run ps with a -T you will see all of the threads as well:

$ ps -elfT | wc -l  
385

As you can see the process count increased significantly due to threads. Normally this is never any type of problem, However in Java based applications this can cause your system to run into system limits! Let's continue our investigation. Let's see how many Threads are spawned by your JBoss Process. You can do it in at least two ways:

$ ps -p JBOSSPID -lfT | wc -l

The above shell will return the number of Lightweight Processes created for a Process indicated by the PID. This should match with the Thread Dump count generated by jstack:

$ jstack -l JBOSSPID | grep tid | wc -l

Now you should have evidence or not that you need to increase the number of processes for the user. This can be done with the following command:

$ ulimit -u 4096

3) Check your threads PID limit

Once that you have counted the number of threads, then you should verify that you are not hitting system limits, specified by the kernel.pid_max limit parameter. You can check this value by executing:

$ sysctl -a | grep kernel.pid_max  

kernel.pid_max = 32768

4) Reduce the Thread Stack size

Another option which you can use, if you are not able to modify the OS settings is reducing the stack size. The JVM has an interesting implementation, by which the more memory is allocated for the heap (not necessarily used by the heap), the less memory available in the stack, and since threads are made from the stack, in practice this means more “memory” in the heap sense (which is usually what people talk about) results in less threads being able to run concurrently.

First of all check the default Thread Stack size which is dependent on your Operating System:

$  java -XX:+PrintFlagsFinal -version | grep ThreadStackSize
     intx ThreadStackSize                           = 1024                                {pd product}

As you can see, the default Thread Stack Size is 1024 kb in our machine. In order to reduce the stack size, add “-Xss” option to the JVM options. In JBoss EAP 6 / WildFly the minimum Thread stack size is 228kb. You can change it in Standalone mode by varying the JAVA_OPTS as in the following example:

JAVA_OPTS="-Xms128m -Xmx1303m -Xss256k"

In Domain Mode, you can configure the jvm element at various level (Host, Server Group, Server). There you can set the requested Stack Size as in the following section:

<jvm name="default">
            <heap size="64m" max-size="256m"/>
            <jvm-options>
                <option value="-server"/>
                <option value="-Xss256k"/>
            </jvm-options>
</jvm>

Resource Link:

How to solve java.lang.OutOfMemoryError: unable to create new native thread

Dag-scheduler-event-loop java.lang.OutOfMemoryError: unable to create new native thread

Tags:

java

apache-spark

user1870400

2 Answers

user1870400

Resource Link:

SkyWalker

Recent Activity

Donate For Us

Dag-scheduler-event-loop java.lang.OutOfMemoryError: unable to create new native thread

Tags:

java

apache-spark

user1870400

2 Answers

user1870400

Resource Link:

SkyWalker

Related questions

Recent Activity

Donate For Us