Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark/PySpark: An error occurred while trying to connect to the Java server (127.0.0.1:39543)

Good afternoon,

In the last two days occurs many connection problems to the Java server. It´s a little bit uncommon because the error occurs not always, only sometimes...

I am using PySpark combined with Jupyter Notebook. Everything is running on a VM instance in the Google Cloud. I am using this one in Google Cloud:

custom (8 vCPUs, 200 GB) 

These are the other settings:

conf = pyspark.SparkConf().setAppName("App")
conf = (conf.setMaster('local[*]')
        .set('spark.executor.memory', '180G')
        .set('spark.driver.memory', '180G')
        .set('spark.driver.maxResultSize', '180G'))

sc = pyspark.SparkContext(conf=conf)
sq = pyspark.sql.SQLContext(sc)

I trained a Random Forest Model and made predictions:

model = rf.fit(train)
predictions = model.transform(test)

Afterwards I created the ROC-Curve and compute the AUC-value.

Then I wanted to see the confusion matrix:

confusion_mat = metrics.confusionMatrix().toArray()
print(confusion_mat_train_rf)

And now the error occurs:

    Traceback (most recent call last):
  File "/usr/lib/python2.7/SocketServer.py", line 290, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 318, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 331, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python2.7/SocketServer.py", line 652, in __init__
    self.handle()
  File "/usr/local/lib/python2.7/dist-packages/pyspark/accumulators.py", line 235, in handle
    num_updates = read_int(self.rfile)
  File "/usr/local/lib/python2.7/dist-packages/pyspark/serializers.py", line 577, in read_int
    raise EOFError
EOFError
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/py4j/java_gateway.py", line 883, in send_command
    response = connection.send_command(command)
  File "/usr/local/lib/python2.7/dist-packages/py4j/java_gateway.py", line 1040, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
Py4JNetworkError: Error while receiving
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:39543)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/py4j/java_gateway.py", line 963, in start
    self.socket.connect((self.address, self.port))
  File "/usr/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused

Here is the output from the console:

OpenJDK 64-Bit Server VM warning
: INFO: os::commit_memory(0x00007f4998300000, 603979776, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 603979776 bytes for committing reserved memory.

Logfile:

#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 603979776 bytes for committing reserved memory.
# Possible reasons:
#   The system is out of physical RAM or swap space
#   In 32 bit mode, the process size limit was hit
# Possible solutions:
#   Reduce memory load on the system
#   Increase physical memory or swap space
#   Check if swap backing store is full
#   Use 64 bit Java on a 64 bit OS
#   Decrease Java heap size (-Xmx/-Xms)
#   Decrease number of Java threads
#   Decrease Java thread stack sizes (-Xss)
#   Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
#  Out of Memory Error (os_linux.cpp:2643), pid=2377, tid=0x00007f1c94fac700
#
# JRE version: OpenJDK Runtime Environment (8.0_151-b12) (build 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12)
# Java VM: OpenJDK 64-Bit Server VM (25.151-b12 mixed mode linux-amd64 )
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#

---------------  S Y S T E M  ---------------

OS:DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"

uname:Linux 4.13.0-1008-gcp #11-Ubuntu SMP Thu Jan 25 11:08:44 UTC 2018 x86_64
libc:glibc 2.23 NPTL 2.23 
rlimit: STACK 8192k, CORE 0k, NPROC 805983, NOFILE 1048576, AS infinity
load average:7.69 4.51 3.57

/proc/meminfo:
MemTotal:       206348252 kB
MemFree:         1298460 kB
MemAvailable:     250308 kB
Buffers:            6812 kB
Cached:           438232 kB
SwapCached:            0 kB
Active:         203906416 kB
Inactive:         339540 kB
Active(anon):   203804300 kB
Inactive(anon):     8392 kB
Active(file):     102116 kB
Inactive(file):   331148 kB
Unevictable:        3652 kB
Mlocked:            3652 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:              4688 kB
Writeback:             0 kB
AnonPages:      203805168 kB
Mapped:            23076 kB
Shmem:              8776 kB
Slab:             114476 kB
SReclaimable:      50640 kB
SUnreclaim:        63836 kB
KernelStack:        4752 kB
PageTables:       404292 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    103174124 kB
Committed_AS:   205956256 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       71628 kB
DirectMap2M:     4122624 kB
DirectMap1G:    207618048 kB


CPU:total 8 (initial active 8) (4 cores per cpu, 2 threads per core) family 6 model 85 stepping 3, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx

Does anyone have any idea what the problem might be and how i can solve this? I am desperate. :(

// I think the Java Runtime Environment has not enough memory to continue... But what can i do?

Thank you very much!

like image 859
qwertz Avatar asked Jan 30 '18 14:01

qwertz


1 Answers

If you are

using this one in Google Cloud:

custom (8 vCPUs, 200 GB)

then you significantly oversubscribe memory. Ignoring that spark.executor.memory has no effect in local mode.

spark.executor.memory accounts only for JVM heap and doesn't cover:

  • PySpark workers memory.
  • PySpark driver memory.

Even with JVM only a part of it can be used for data processing (see Memory Management Overview) so spark.driver.maxResultSize equal to the total assigned memory does not make sense.

like image 78
Alper t. Turker Avatar answered Sep 24 '22 16:09

Alper t. Turker