Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to open native connection with spark sometimes

I'm running a Spark job with Spark version 1.4 and Cassandra 2.18. I telnet from master and it works to cassandra machine. Sometimes the job runs fine and sometimes I get the following exception. Why would this happen only sometimes?

"Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 7, 172.28.0.162): java.io.IOException: Failed to open native connection to Cassandra at {172.28.0.164}:9042 at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:155) "

It sometimes also gives me this exception along with the upper one:

Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /172.28.0.164:9042 (com.datastax.driver.core.TransportException: [/172.28.0.164:9042] Connection has been closed))

like image 569
Nipun Avatar asked Jul 16 '15 15:07

Nipun


1 Answers

I had the second error "NoHostAvailableException" happen to me quite a few times this week as I was porting Python spark to Java Spark.

I was having issues with the driver thread being nearly out of memory and the GC was taking up all my cores (98% of all 8 core), pausing the JVM all the time.

In python when this happens it's much more obvious (to me) so it took me a bit of time to realize what was going on, so I got this error quite a few times.

I had two theory on the root cause, but the solution was not having the GC go crazy.

  1. First theory, was that because it was pausing so often, I just couldn't connect to Cassandra.
  2. Second theory: Cassandra was running on the same machine as Spark and the JVM was taking 100% of all CPU so Cassandra just couldn't answer in time and it looked to the driver like there were no Cassandra host.

Hope this helps!

like image 72
Code Herder Avatar answered Nov 18 '22 07:11

Code Herder