Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iteration performance Java vs. C++

This one puzzles me for 3 days. I have an application which needs to evaluate a certain set of integer polynomials (of multiple args) which have very few elements. I already have an implementation written in Java and I am currently porting to C++.

During testing, I noticed that the C++ version is orders of magnitudes slower that the Java variant. I know of course about JIT-ing and that this scenario is particulary well-posed for this kind of compilers, but what I see is way off from what I had expected.

The sample code is below, you'll need boost to compile the C++ code (but that dependency is only required for simple time measurement).

choeger@daishi ~/uebb % clang++ -O3 -std=c++11 polytest.cpp -lboost_timer -lboost_system
choeger@daishi ~/uebb % ./a.out                                                         
 0.011694s wall, 0.010000s user + 0.000000s system = 0.010000s CPU (85.5%)
Ideal Result: 1e+07
 0.421986s wall, 0.420000s user + 0.000000s system = 0.420000s CPU (99.5%)
Result: 1e+07
choeger@daishi ~/uebb % javac PolyTest.java                                             
choeger@daishi ~/uebb % java PolyTest                                                   
evals: 10000000 runtime: 17ms
Ideal Result: 1.0E7
evals: 10000000 runtime: 78ms
Result: 1.0E7

Apparently, the C++ version (compiled with clang-3.3) runs slightly faster when it comes to pure computational power, but Java (openjdk 1.7.0.60) does much better when the polynomial is interpreted. My guess so far is, that my C++ code is not quite optimal due to the iteration over small (in the sample 1-element) vectors. I assume the JVM does much better here when it comes to cache-hits misses.

Is there any way to make my C++ version perform better? Is there a different cause I just did not see? And as a side note: is there way to measure cache-coherence for a C++ and a Java process?

The C++ code looks like this:

#include <boost/timer/timer.hpp>

#include <iostream>
#include <vector>

using namespace std;

struct Product {
  int factor;
  vector<int> fields;
};


class SumOfProducts {
public:
  vector<Product> sum;

  /**
   * evaluate the polynomial with arguments separated by width
   */
  inline double eval(const double* arg, const int width) const {
    double res = 0.0;
    for (Product p : sum) {
      double prod = p.factor;
      for (int f : p.fields) {
    prod *= arg[f*width];
      }
      res += prod;
    }
    return res;
  };
};


double idealBenchmark(const double* arg, const int width) {
  boost::timer::auto_cpu_timer t;
  double res = 0.0;
  // run 10M evaluations 
  for (long l = 0; l < 10000000; l++) {
    res = res + arg[width] * arg[width];
  }
  return res;
}

double benchmark(const double* arg, const SumOfProducts& poly) {
  boost::timer::auto_cpu_timer t;
  double res = 0.0;
  // run 10M evaluations 
  for (long l = 0; l < 10000000; l++) {
    res = res + poly.eval(arg, 1);
  }
  return res;
}

int main() {
  //simple polynomial: x_1^2
  Product p;
  p.factor = 1;
  p.fields.push_back(1);
  p.fields.push_back(1);

  SumOfProducts poly;
  poly.sum.push_back(p);

  double arg[] = { 0, 1 };

  double res = idealBenchmark(arg, 1);
  cout << "Ideal Result: " << res << endl;

  res = benchmark(arg, poly);
  cout << "Result: " << res << endl;
}

The Java version like this:

public class PolyTest {

    static class Product {
    public final int factor;
    public final int[] fields;

    public Product(int pFactor, int[] pFields) {
        factor = pFactor;
        fields = pFields;
    }
    }

    static class SumOfProducts {
    final Product[] sum;

    public SumOfProducts(Product[] pSum) {
        sum = pSum;
    }

    /**
     * evaluate the polynomial with arguments separated by width
     */
    double eval(final double[] arg, final int width) {
        double res = 0.0;
        for (Product p : sum) {
        double prod = p.factor;
        for (int f : p.fields) {
            prod *= arg[f*width];
        }
        res += prod;
        }
        return res;
    }
    }

    static double idealBenchmark(final double[] arg, final int width) {
    final long start = System.currentTimeMillis();
    double res = 0.0;
    long evals = 0;
    // run 10M evaluations 
    for (long l = 0; l < 10000000; l++) {
        evals++;
        res = res + arg[width] * arg[width];
    }
    System.out.println("evals: " + evals + " runtime: " + (System.currentTimeMillis() - start) + "ms");
    return res;
    }

    static double benchmark(final double[] arg, final SumOfProducts poly) {
    final long start = System.currentTimeMillis();
    double res = 0.0;
    long evals = 0;
    // run 10M evaluations 
    for (long l = 0; l < 10000000; l++) {
        evals++;
        res = res + poly.eval(arg, 1);
    }
    System.out.println("evals: " + evals + " runtime: " + (System.currentTimeMillis() - start) + "ms");
    return res;
    }

    public static void main(String[] args) {
    //simple polynomial: x_1^2
    Product p = new Product(1, new int[]{1, 1});

    SumOfProducts poly = new SumOfProducts(new Product[]{p});

    double arg[] = { 0, 1 };

    double res = idealBenchmark(arg, 1);
    System.out.println("Ideal Result: " + res);

    res = benchmark(arg, poly);
    System.out.println("Result: " + res);
    }

}
like image 532
choeger Avatar asked Jan 08 '14 21:01

choeger


People also ask

Does Java run faster than C?

C is a low-level, executable-compiled language, that can do low-level hardware operations such as dynamic memory allocation, and accessing hardware drivers at a very low level. C is normally faster than Java, if it is written efficiently, but it always depends on the compiler.

Is Java slower than C?

Java startup time is often much slower than many languages, including C, C++, Perl or Python, because many classes (and first of all classes from the platform Class libraries) must be loaded before being used.

Is Java more performant than C++?

Java vs. C++ performance. In contrast, a program written in C++ gets compiled directly into machine code -- without an intermediary translation required at runtime. This is one reason why C++ programs tend to perform faster than those written in Java.

Which is faster among Java C++ and C and why?

Speed and performance Java is a favorite among developers, but because the code must first be interpreted during run-time, it's also slower. C++ is compiled to binaries, so it runs immediately and therefore faster than Java programs.


1 Answers

You are making expensive copies here:

for (Product p : sum)

Each copy means fully copying the std::vector<int> data member of each element. Use references instead:

for (const Product& p : sum)

Note that I made them const, because you do not need to change the elements of the range.

like image 90
juanchopanza Avatar answered Oct 24 '22 15:10

juanchopanza