Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the quantitative overhead of making a JNI call?

Based on performance alone, approximately how many "simple" lines of java is the equivalent performance hit of making a JNI call?

Or to try to express the question in a more concrete way, if a simple java operation such as

someIntVar1 = someIntVar2 + someIntVar3; 

was given a "CPU work" index of 1, what would be the typical (ballpark) "CPU work" index of the overhead of making the JNI call?


This question ignores the time taken waiting for the native code to execute. In telephonic parlance, it is strictly about the "flag fall" part of the call, not the "call rate".


The reason for asking this question is to have a "rule of thumb" to know when to bother attempting coding a JNI call when you know the native cost (from direct testing) and the java cost of a given operation. It could help you quickly avoid the hassle to coding the JNI call only to find that the callout overhead consumed any benefit of using native code.

Edit:

Some folks are getting hung up on variations in CPU, RAM etc. These are all virtually irrelevant to the question - I'm asking for the relative cost to lines of java code. If CPU and RAM are poor, they are poor for both java and JNI so environmental considerations should balance out. The JVM version falls into the "irrelevant" category too.

This question isn't asking for an absolute timing in nanoseconds, but rather a ball park "work effort" in units of "lines of simple java code".

like image 972
Bohemian Avatar asked Dec 20 '12 13:12

Bohemian


People also ask

Are JNI calls expensive?

Function calls to JNI methods are expensive, especially when calling a method repeatedly. Native methods are not inlined by the JVM, nor can the method be JIT compiled, as the method is already compiled.

What is a JNI call?

JNI is the Java Native Interface. It defines a way for the bytecode that Android compiles from managed code (written in the Java or Kotlin programming languages) to interact with native code (written in C/C++).

Is JNI fast?

When talking about JNI, there are two directions: java calling C++, and C++ calling java. Java calling C++ (or C) via the "native" keyword is very fast, around 50 clock cycles. However, C++ calling Java is somewhat slow.


1 Answers

Quick profiler test yields:

Java class:

public class Main {     private static native int zero();      private static int testNative() {         return Main.zero();     }      private static int test() {         return 0;     }      public static void main(String[] args) {         testNative();         test();     }      static {          System.loadLibrary("foo");     } } 

C library:

#include <jni.h> #include "Main.h"  JNIEXPORT int JNICALL  Java_Main_zero(JNIEnv *env, jobject obj) {     return 0; } 

Results:

single invocation10 calls in a loop100 calls in a loop

System details:

java version "1.7.0_09" OpenJDK Runtime Environment (IcedTea7 2.3.3) (7u9-2.3.3-1) OpenJDK Server VM (build 23.2-b09, mixed mode) Linux visor 3.2.0-4-686-pae #1 SMP Debian 3.2.32-1 i686 GNU/Linux 

Update: Caliper micro-benchmarks for x86 (32/64 bit) and ARMv6 are as follows:

Java class:

public class Main extends SimpleBenchmark {     private static native int zero();     private Random random;     private int[] primes;      public int timeJniCall(int reps) {         int r = 0;         for (int i = 0; i < reps; i++) r += Main.zero();         return r;     }      public int timeAddIntOperation(int reps) {         int p = primes[random.nextInt(1) + 54];   // >= 257         for (int i = 0; i < reps; i++) p += i;         return p;     }      public long timeAddLongOperation(int reps) {         long p = primes[random.nextInt(3) + 54];  // >= 257         long inc = primes[random.nextInt(3) + 4]; // >= 11         for (int i = 0; i < reps; i++) p += inc;         return p;     }      @Override     protected void setUp() throws Exception {         random = new Random();         primes = getPrimes(1000);     }      public static void main(String[] args) {         Runner.main(Main.class, args);             }      public static int[] getPrimes(int limit) {         // returns array of primes under $limit, off-topic here     }      static {         System.loadLibrary("foo");     } } 

Results (x86/i7500/Hotspot/Linux):

Scenario{benchmark=JniCall} 11.34 ns; σ=0.02 ns @ 3 trials Scenario{benchmark=AddIntOperation} 0.47 ns; σ=0.02 ns @ 10 trials Scenario{benchmark=AddLongOperation} 0.92 ns; σ=0.02 ns @ 10 trials         benchmark     ns linear runtime          JniCall 11.335 ==============================  AddIntOperation  0.466 = AddLongOperation  0.921 == 

Results (amd64/phenom 960T/Hostspot/Linux):

Scenario{benchmark=JniCall} 6.66 ns; σ=0.22 ns @ 10 trials Scenario{benchmark=AddIntOperation} 0.29 ns; σ=0.00 ns @ 3 trials Scenario{benchmark=AddLongOperation} 0.26 ns; σ=0.00 ns @ 3 trials     benchmark    ns linear runtime          JniCall 6.657 ==============================  AddIntOperation 0.291 = AddLongOperation 0.259 = 

Results (armv6/BCM2708/Zero/Linux):

Scenario{benchmark=JniCall} 678.59 ns; σ=1.44 ns @ 3 trials Scenario{benchmark=AddIntOperation} 183.46 ns; σ=0.54 ns @ 3 trials Scenario{benchmark=AddLongOperation} 199.36 ns; σ=0.65 ns @ 3 trials     benchmark  ns linear runtime          JniCall 679 ==============================  AddIntOperation 183 ======== AddLongOperation 199 ======== 

To summarize things a bit, it seems that JNI call is roughly equivalent to 10-25 java ops on typical (x86) hardware and Hotspot VM. At no surprise, under much less optimized Zero VM, the results are quite different (3-4 ops).


Thanks go to @Giovanni Azua and @Marko Topolnik for participation and hints.

like image 61
barti_ddu Avatar answered Oct 09 '22 09:10

barti_ddu