Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Py4J has bigger overhead than Jython and JPype

After searching for an option to run Java code from Django application(python), I found out that Py4J is the best option for me. I tried Jython, JPype and Python subprocess and each of them have certain limitations:

  • Jython. My app runs in python.
  • JPype is buggy. You can start JVM just once after that it fails to start again.
  • Python subprocess. Cannot pass Java object between Python and Java, because of regular console call.

On Py4J web site is written:

In terms of performance, Py4J has a bigger overhead than both of the previous solutions (Jython and JPype) because it relies on sockets, but if performance is critical to your application, accessing Java objects from Python programs might not be the best idea.

In my application performance is critical, because I'm working with Machine learning framework Mahout. My question is: Will Mahout also run slower because of Py4J gateway server or this overhead just mean that invoking Java methods from Python functions is slower (in latter case performance of Mahout will not be a problem and I can use Py4J).

like image 740
HIP_HOP Avatar asked Aug 28 '13 10:08

HIP_HOP


2 Answers

PySpark uses Py4J quite successfully. If all the heavylifting is done on Spark (or Mahout in your case) itself, and you just want to return result back to "driver"/Python code, then Py4J might work for you very well as well.

Py4j has slightly bigger overhead for huge results (that's not necessarily the case for Spark workloads, as you only return summaries /aggregates for the dataframes). There is an improvement discussion for py4j to switch to binary serialization to remove that overhead for higher badnwidth requirements too: https://github.com/bartdag/py4j/issues/159

like image 101
Tagar Avatar answered Nov 01 '22 19:11

Tagar


I don't know Mahout. But think about that: At least with JPype and Py4J you will have performance impact when converting types from Java to Python and vice versa. Try to minimize calls between the languages. Maybe it's an alternative for you to code a thin wrapper in Java that condenses many Javacalls to one python2java call.

like image 22
bastian Avatar answered Nov 01 '22 19:11

bastian