It is said that Java is 10x faster than python in terms of performance. That's what I see from benchmarks too. But what really brings down Java is the JVM startup time.
This is a test I made:
$time xlsx2csv.py Types\ of\ ESI\ v2.doc-emb-Package-9
...
<output skipped>
real 0m0.085s
user 0m0.072s
sys 0m0.013s
$time java -jar -client /usr/local/bin/tika-app-0.7.jar -m Types\ of\ ESI\ v2.doc-emb-Package-9
real 0m2.055s
user 0m2.433s
sys 0m0.078s
Same file , a 12 KB ms XLSX embedded file inside Docx and Python is 25x faster !! WTH!!
It takes 2.055 sec for Java.
I know it is all due to startup time, but what i need is i need to call it via a script to parse some documents which i do not want to re-invent the wheel in python.
But as to parse 10k+ files , it is just not practical..
Anyway to speed it up (I already tried -client option and it only speed up by so little(20%) ).
My another idea? Run it as a long-running daemon , communicate using UDP or Linux-ICP sockets locally?
2.1 Possible Causes for Slow JVM Startup An application might seem slow when it starts because, The application might be waiting to import files. A large number of methods might have to be compiled. There might be a problem in code optimization (especially on single-CPU machines).
You can improve performance by increasing your heap size or using a different garbage collector. In general, for long-running server applications, use the Java SE throughput collector on machines with multiple processors ( -XX:+AggressiveHeap ) and as large a heap as you can fit in the free memory of your machine.
JVM warm-up effectWhen a JVM based app is launched, the first requests it receives are generally significantly slower than the average response time. This warm-up effect is usually due to class loading and bytecode interpretation at startup. After 10k iterations, the main code path is compiled and “hot”.
The way to start a jvm is by invoking the main, either by invoking a jar using java -jar MyJar or by simply running main class from an IDE. Yes, Multiple jvm instances can be run on a single machine, they all will have their own memory allocated. There will be that many jvms as many main programs you run.
Try Nailgun.
Note: I don't use it personally.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With