Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 8 times faster with arrays than std::vector in C++. What did I do wrong?

I have the following Java code with several big arrays which never change their size. It runs in 1100 ms on my computer.

I implemented the same code in C++ and used std::vector.

The time of the C++ implementation which runs the exact same code is 8800 ms on my computer. What did I do wrong, so that it runs this slowly?

Basically the code does the following:

for (int i = 0; i < numberOfCells; ++i) {         h[i] =  h[i] + 1;         floodedCells[i] =  !floodedCells[i];         floodedCellsTimeInterval[i] =  !floodedCellsTimeInterval[i];         qInflow[i] =  qInflow[i] + 1; } 

It iterates through different arrays with a size of around 20000.

You can find both implementations under the following links:

  • Java: https://ideone.com/R8KqjT
  • C++: https://ideone.com/Lu7RpE

(On ideone I could only run the loop 400 times instead of 2000 times because of the time limitation. But even here there is a difference of three times)

like image 350
RobinXSI Avatar asked Apr 15 '15 17:04

RobinXSI


People also ask

Is std :: array faster than vector?

A std::vector can never be faster than an array, as it has (a pointer to the first element of) an array as one of its data members. But the difference in run-time speed is slim and absent in any non-trivial program. One reason for this myth to persist, are examples that compare raw arrays with mis-used std::vectors.

Is it better to use array or vector?

Vector is better for frequent insertion and deletion, whereas Arrays are much better suited for frequent access of elements scenario. Vector occupies much more memory in exchange for managing storage and growing dynamically, whereas Arrays are a memory-efficient data structure.

Is Java faster then C?

C is normally faster than Java, if it is written efficiently, but it always depends on the compiler. Some compilers support optimization on the compile time, to produce more efficient code, by removing redundant code and other unnecessary artefacts.

Which is faster vector or array in CPP?

The conclusion is that arrays of integers are faster than vectors of integers (5 times in my example). However, arrays and vectors are arround the same speed for more complex / not aligned data.


1 Answers

Yep, the cache in the c++ version takes a hammering. It seems the JIT is better equipped to handle this.

If you change the outer for in isUpdateNeeded() to shorter snippets. The difference goes away.

The sample below produces a 4x speedup.

void isUpdateNeeded() {     for (int i = 0; i < numberOfCells; ++i) {         h[i] =  h[i] + 1;         floodedCells[i] =  !floodedCells[i];         floodedCellsTimeInterval[i] =  !floodedCellsTimeInterval[i];         qInflow[i] =  qInflow[i] + 1;         qStartTime[i] =  qStartTime[i] + 1;         qEndTime[i] =  qEndTime[i] + 1;     }      for (int i = 0; i < numberOfCells; ++i) {         lowerFloorCells[i] =  lowerFloorCells[i] + 1;         cellLocationX[i] =  cellLocationX[i] + 1;         cellLocationY[i] =  cellLocationY[i] + 1;         cellLocationZ[i] =  cellLocationZ[i] + 1;         levelOfCell[i] =  levelOfCell[i] + 1;         valueOfCellIds[i] =  valueOfCellIds[i] + 1;         h0[i] =  h0[i] + 1;         vU[i] =  vU[i] + 1;         vV[i] =  vV[i] + 1;         vUh[i] =  vUh[i] + 1;         vVh[i] =  vVh[i] + 1;     }     for (int i = 0; i < numberOfCells; ++i) {         vUh0[i] =  vUh0[i] + 1;         vVh0[i] =  vVh0[i] + 1;         ghh[i] =  ghh[i] + 1;         sfx[i] =  sfx[i] + 1;         sfy[i] =  sfy[i] + 1;         qIn[i] =  qIn[i] + 1;         for(int j = 0; j < nEdges; ++j) {             neighborIds[i * nEdges + j] = neighborIds[i * nEdges + j] + 1;         }         for(int j = 0; j < nEdges; ++j) {             typeInterface[i * nEdges + j] = typeInterface[i * nEdges + j] + 1;         }     }  } 

This shows to a reasonable degree that cache misses are the reason for the slowdown. It is also important to note that the variables are not dependent so a threaded solution is easily created.

Order restored

As per stefans comment I tried grouping them in a struct using the original sizes. This removes the immediate cache pressure in a similar fashion. The result is that the c++ (CCFLAG -O3) version is about 15% faster than the java version.

Varning neither short nor pretty.

#include <vector> #include <cmath> #include <iostream>       class FloodIsolation {     struct item{       char floodedCells;       char floodedCellsTimeInterval;       double valueOfCellIds;       double h;       double h0;       double vU;       double vV;       double vUh;       double vVh;       double vUh0;       double vVh0;       double sfx;       double sfy;       double qInflow;       double qStartTime;       double qEndTime;       double qIn;       double nx;       double ny;       double ghh;       double floorLevels;       int lowerFloorCells;       char flagInterface;       char floorCompletelyFilled;       double cellLocationX;       double cellLocationY;       double cellLocationZ;       int levelOfCell;     };     struct inner_item{       int typeInterface;       int neighborIds;     };      std::vector<inner_item> inner_data;     std::vector<item> data;  public:     FloodIsolation() :             numberOfCells(20000), inner_data(numberOfCells * nEdges), data(numberOfCells)    {      }     ~FloodIsolation(){     }       void isUpdateNeeded() {         for (int i = 0; i < numberOfCells; ++i) {             data[i].h = data[i].h + 1;             data[i].floodedCells = !data[i].floodedCells;             data[i].floodedCellsTimeInterval = !data[i].floodedCellsTimeInterval;             data[i].qInflow = data[i].qInflow + 1;             data[i].qStartTime = data[i].qStartTime + 1;             data[i].qEndTime = data[i].qEndTime + 1;             data[i].lowerFloorCells = data[i].lowerFloorCells + 1;             data[i].cellLocationX = data[i].cellLocationX + 1;             data[i].cellLocationY = data[i].cellLocationY + 1;             data[i].cellLocationZ = data[i].cellLocationZ + 1;             data[i].levelOfCell = data[i].levelOfCell + 1;             data[i].valueOfCellIds = data[i].valueOfCellIds + 1;             data[i].h0 = data[i].h0 + 1;             data[i].vU = data[i].vU + 1;             data[i].vV = data[i].vV + 1;             data[i].vUh = data[i].vUh + 1;             data[i].vVh = data[i].vVh + 1;             data[i].vUh0 = data[i].vUh0 + 1;             data[i].vVh0 = data[i].vVh0 + 1;             data[i].ghh = data[i].ghh + 1;             data[i].sfx = data[i].sfx + 1;             data[i].sfy = data[i].sfy + 1;             data[i].qIn = data[i].qIn + 1;             for(int j = 0; j < nEdges; ++j) {                 inner_data[i * nEdges + j].neighborIds = inner_data[i * nEdges + j].neighborIds + 1;                 inner_data[i * nEdges + j].typeInterface = inner_data[i * nEdges + j].typeInterface + 1;             }         }       }       static const int nEdges; private:       const int numberOfCells;  };   const int FloodIsolation::nEdges = 6;  int main() {     FloodIsolation isolation;     clock_t start = clock();     for (int i = 0; i < 4400; ++i) {         if(i % 100 == 0) {             std::cout << i << "\n";         }         isolation.isUpdateNeeded();     }      clock_t stop = clock();     std::cout << "Time: " << difftime(stop, start) / 1000 << "\n"; }                                                                                

My result differs slightly from Jerry Coffins for the original sizes. For me the differences remains. It might well be my java version, 1.7.0_75.

like image 184
Captain Giraffe Avatar answered Sep 30 '22 13:09

Captain Giraffe