Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ iostream vs. C stdio performance/overhead

I'm trying to comprehend how to improve the performance of this C++ code to bring it on par with the C code it is based on. The C code looks like this:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef struct point {
  double x, y;
} point_t;

int read_point(FILE *fp, point_t *p) {
  char buf[1024];
  if (fgets(buf, 1024, fp)) {
    char *s = strtok(buf, " ");
    if (s) p->x = atof(s); else return 0;
    s = strtok(buf, " ");
    if (s) p->y = atof(s); else return 0;
  }
  else
    return 0;
  return 1;
}

int main() {
  point_t p;
  FILE *fp = fopen("biginput.txt", "r");

  int i = 0;
  while (read_point(fp, &p))
    i++;

  printf("read %d points\n", i);
  return 0;
}

The C++ code looks like this:

#include <iostream>
#include <fstream>

using namespace std;

struct point {
  double x, y;
};

istream &operator>>(istream &in, point &p) {
  return in >> p.x >> p.y;
}

int main() {
  point p;
  ifstream input("biginput.txt");

  int i = 0;
  while (input >> p)
    i++;

  cout << "read " << i << " points" << endl;
  return 0;
}

I like that the C++ code is shorter and more direct, but when I run them both on my machine I get very different performance (both being run on the same machine against a 138 MB test file):

$ time ./test-c
read 10523988 points
    1.73 real         1.68 user         0.04 sys
# subsequent runs:
    1.69 real         1.64 user         0.04 sys
    1.72 real         1.67 user         0.04 sys
    1.69 real         1.65 user         0.04 sys

$ time ./test-cpp
read 10523988 points
   14.50 real        14.36 user         0.07 sys
# subsequent runs
   14.79 real        14.43 user         0.12 sys
   14.76 real        14.40 user         0.11 sys
   14.58 real        14.36 user         0.09 sys
   14.67 real        14.40 user         0.10 sys

Running either program many times in succession does not change the result that the C++ version is about 10x slower.

The file format is just lines of space-separated doubles, such as:

587.96 600.12
430.44 628.09
848.77 468.48
854.61 76.18
240.64 409.32
428.23 643.30
839.62 568.58

Is there a trick to reducing the overhead that I'm missing?

Edit 1: Making the operator inline seems to have had a very small but possibly detectable effect:

   14.62 real        14.47 user         0.07 sys
   14.54 real        14.39 user         0.07 sys
   14.58 real        14.43 user         0.07 sys
   14.63 real        14.45 user         0.08 sys
   14.54 real        14.32 user         0.09 sys

This doesn't really solve the problem.

Edit 2: I'm using clang:

$ clang --version
Apple LLVM version 7.0.0 (clang-700.0.72)
Target: x86_64-apple-darwin15.5.0
Thread model: posix

I'm not using any optimization level on either the C or C++ and they're both being compiled with the same version of Clang on my Mac. Probably the version that comes with Xcode (/usr/bin/clang) on OS X 10.11. I figured it would cloud the issue if I enable optimizations in one but not the other or use different compilers.

Edit 3: replacing istream &operator>> with something else

I've rewritten the istream operator to be closer to the C version, and it is improved, but I still see a ~5x performance gap.

inline istream &operator>>(istream &in, point &p) {
  string line;
  getline(in, line);

  if (line.empty())
    return in;

  size_t next = 0;
  p.x = stod(line, &next);
  p.y = stod(line.substr(next));
  return in;
}

Runs:

$ time ./test-cpp
read 10523988 points
    6.85 real         6.74 user         0.05 sys
# subsequently
    6.70 real         6.62 user         0.05 sys
    7.16 real         6.86 user         0.12 sys
    6.80 real         6.59 user         0.09 sys
    6.79 real         6.59 user         0.08 sys

Interestingly, compiling this with -O3 is a substantial improvement:

$ time ./test-cpp
read 10523988 points
    2.44 real         2.38 user         0.04 sys
    2.43 real         2.38 user         0.04 sys
    2.49 real         2.41 user         0.04 sys
    2.51 real         2.42 user         0.05 sys
    2.47 real         2.40 user         0.05 sys

Edit 4: Replacing body of istream operator>> with C stuff

This version gets quite close to the performance of C:

inline istream &operator>>(istream &in, point &p) {
  char buf[1024];
  in.getline(buf, 1024);
  char *s = strtok(buf, " ");
  if (s)
    p.x = atof(s);
  else
    return in;

  s = strtok(NULL, " ");
  if (s)
    p.y = atof(s);

  return in;
}

Timing it unoptimized gets us in the 2 second territory, where optimization puts it over the unoptimized C (optimized C still wins though). To be precise, without optimizations:

    2.13 real         2.08 user         0.04 sys
    2.14 real         2.07 user         0.04 sys
    2.33 real         2.15 user         0.05 sys
    2.16 real         2.10 user         0.04 sys
    2.18 real         2.12 user         0.04 sys
    2.33 real         2.17 user         0.06 sys

With:

    1.16 real         1.10 user         0.04 sys
    1.19 real         1.13 user         0.04 sys
    1.11 real         1.06 user         0.03 sys
    1.15 real         1.09 user         0.04 sys
    1.14 real         1.09 user         0.04 sys

The C with optimizations, just to do apples-to-apples:

    0.81 real         0.77 user         0.03 sys
    0.82 real         0.78 user         0.04 sys
    0.87 real         0.80 user         0.04 sys
    0.84 real         0.77 user         0.04 sys
    0.83 real         0.78 user         0.04 sys
    0.83 real         0.77 user         0.04 sys

I suppose I could live with this, but as a novice C++ user, I'm now wondering if:

  1. Is it worth trying to do this another way? I'm not sure it matters what happens inside the istream operator>>.
  2. Is there another way to build the C++ code that might perform better besides these three ways?
  3. Is this idiomatic? If not, do most people just accept the performance for what it is?

Edit 5: This question is totally different from the answer about printf, I don't see how the linked question this is supposedly a duplicate of addresses any of the three points directly above this.

like image 833
Daniel Lyons Avatar asked Jun 18 '16 07:06

Daniel Lyons


1 Answers

What's causing a significant difference in performance is a significant difference in the overall functionality.

I will do my best to compare both of your seemingly equivalent approaches in details.

In C:

Looping

  • Read characters until a newline or end-of-file is detected or max length (1024) is reached
  • Tokenize looking for the hardcoded white-space delimiter
  • Parse into double without any questions

In C++:

Looping

  • Read characters until one of the default delimiters is detected. This isn't limiting the detection to your actual data pattern. It will check for more delimiters just in case. Overhead everywhere.
  • Once it found a delimiter, it will try to parse the accumulated string gracefully. It won't assume a pattern in your data. For example, if there is 800 consecutive numeric characters and isn't a good candidate for the type anymore, it must be able to detect that possibility by itself, so it adds some overhead for that.

One way to improve performance that I'd suggest is very near of what Peter said in above comments. Use getline inside operator>> so you can tell about your data. Something like this should be able to give some of your speed back, thought it's somehow like C-ing a part of your code back:

istream &operator>>(istream &in, point &p) {
    char bufX[10], bufY[10];
    in.getline(bufX, sizeof(bufX), ' ');
    in.getline(bufY, sizeof(bufY), '\n');
    p.x = atof(bufX);
    p.y = atof(bufY);
    return in;
}

Hope it's helpful.

Edit: applied nneonneo's comment

like image 183
Frederik.L Avatar answered Oct 22 '22 05:10

Frederik.L