How Python interact with JVM inside Spark

Tags:

I am writing Python code to develop some Spark applications. I am really curious how Python interact with running JVM and started reading the source code of Spark.

I can see that in the end, all the Spark transformations/actions ended up be calling certain jvm methods in the following way.

self._jvm.java.util.ArrayList(),
self._jvm.PythonAccumulatorParam(host, port))
self._jvm.org.apache.spark.util.Utils.getLocalDir(self._jsc.sc().conf())
self._jvm.org.apache.spark.util.Utils.createTempDir(local_dir, "pyspark") \
            .getAbsolutePath()
...

As a Python programmer, I am really curious what is going on with this _jvm object. However, I have briefly read all the source code under pyspark and only found _jvm to be an attribute of Context class, beyond that, I know nothing about neither _jvm's attributes nor methods.

Can anyone help me understand how pyspark translate into JVM operations? should I read some scala code and see if _jvm is defined there?

968

asked Apr 22 '15 05:04

B.Mr.W.

1 Answers

It uses py4j. There is a special protocol to translate python calls into JVM calls. All of this you can find in Pyspark code, see java_gateway.py

answered Oct 19 '22 03:10

artemdevel

Related questions
                            
                                Do i have to install jdk win64?
                            
                                How to identify programmatically in Java which Unicode version supported?
                            
                                Run ant task in different jvm
                            
                                How do Akka and Async differ
                            
                                Can a Scala program be compiled to run on any JVM, without having Scala installed on the given machine?
                            
                                How to programatically get all Java JVM installed (Not default one) using Java?
                            
                                Efficient Number of Threads
                            
                                Could not find jenkins_home folder in Ubuntu after downloading the Docker Jenkins Image
                            
                                JVM option to optimize loop statements
                            
                                intellij idea failed to create JVM error code -1
                            
                                Getting JVM error after SOAP UI installation
                            
                                Jvm JIT and Hotspot - What are the differences
                            
                                Does increasing the number of available cores and RAM cause the JVM to perform more GCs?
                            
                                Why is there a huge performance difference between 32 and 64 bit JDK? [duplicate]
                            
                                How does the dynamic tenuring threshold adjustment work in HotSpot JVM?
                            
                                How to strip down JVM to get the smallest install possible?
                            
                                How is the Object class implemented (methods like hashCode and internal fields)?
                            
                                Overhead when calling a component function vs inline code - ColdFusion

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How Python interact with JVM inside Spark

Tags:

jvm

apache-spark

pyspark

B.Mr.W.

People also ask

1 Answers

artemdevel

Recent Activity

Donate For Us