Can I make this C++ code faster without making it much more complex?

Question

here's a problem I've solved from a programming problem website(codechef.com in case anyone doesn't want to see this solution before trying themselves). This solved the problem in about 5.43 seconds with the test data, others have solved this same problem with the same test data in 0.14 seconds but with much more complex code. Can anyone point out specific areas of my code where I am losing performance? I'm still learning C++ so I know there are a million ways I could solve this problem, but I'd like to know if I can improve my own solution with some subtle changes rather than rewrite the whole thing. Or if there are any relatively simple solutions which are comparable in length but would perform better than mine I'd be interested to see them also.

Please keep in mind I'm learning C++ so my goal here is to improve the code I understand, not just to be given a perfect solution.

Thanks

Problem:

The purpose of this problem is to verify whether the method you are using to read input data is sufficiently fast to handle problems branded with the enormous Input/Output warning. You are expected to be able to process at least 2.5MB of input data per second at runtime. Time limit to process the test data is 8 seconds.

The input begins with two positive integers n k (n, k<=10^7). The next n lines of input contain one positive integer ti, not greater than 10^9, each. Output

Write a single integer to output, denoting how many integers ti are divisible by k. Example

Input:

7 3
1
51
966369
7
9
999996
11

Output:

4

Solution:

#include <iostream>
#include <stdio.h>
using namespace std;

int main(){
  //n is number of integers to perform calculation on
  //k is the divisor
  //inputnum is the number to be divided by k
  //total is the total number of inputnums divisible by k

  int n,k,inputnum,total;

  //initialize total to zero
  total=0;

  //read in n and k from stdin
  scanf("%i%i",&n,&k);

  //loop n times and if k divides into n, increment total
  for (n; n>0; n--)
  {
    scanf("%i",&inputnum);
    if(inputnum % k==0) total += 1;
  }

 //output value of total
 printf("%i",total);
 return 0;
}

wallyk · Accepted Answer

The speed is not being determined by the computation—most of the time the program takes to run is consumed by i/o.

Add setvbuf calls before the first scanf for a significant improvement:

setvbuf(stdin, NULL, _IOFBF, 32768);
setvbuf(stdout, NULL, _IOFBF, 32768);

-- edit --

The alleged magic numbers are the new buffer size. By default, FILE uses a buffer of 512 bytes. Increasing this size decreases the number of times that the C++ runtime library has to issue a read or write call to the operating system, which is by far the most expensive operation in your algorithm.

By keeping the buffer size a multiple of 512, that eliminates buffer fragmentation. Whether the size should be 1024*10 or 1024*1024 depends on the system it is intended to run on. For 16 bit systems, a buffer size larger than 32K or 64K generally causes difficulty in allocating the buffer, and maybe managing it. For any larger system, make it as large as useful—depending on available memory and what else it will be competing against.

Lacking any known memory contention, choose sizes for the buffers at about the size of the associated files. That is, if the input file is 250K, use that as the buffer size. There is definitely a diminishing return as the buffer size increases. For the 250K example, a 100K buffer would require three reads, while a default 512 byte buffer requires 500 reads. Further increasing the buffer size so only one read is needed is unlikely to make a significant performance improvement over three reads.

Andrew Dalke · Answer

I tested the following on 28311552 lines of input. It's 10 times faster than your code. What it does is read a large block at once, then finishes up to the next newline. The goal here is to reduce I/O costs, since scanf() is reading a character at a time. Even with stdio, the buffer is likely too small.

Once the block is ready, I parse the numbers directly in memory.

This isn't the most elegant of codes, and I might have some edge cases a bit off, but it's enough to get you going with a faster approach.

Here are the timings (without the optimizer my solution is only about 6-7 times faster than your original reference)

[xavier:~/tmp] dalke% g++ -O3 my_solution.cpp
[xavier:~/tmp] dalke% time ./a.out < c.dat
15728647
0.284u 0.057s 0:00.39 84.6% 0+0k 0+1io 0pf+0w
[xavier:~/tmp] dalke% g++ -O3 your_solution.cpp
[xavier:~/tmp] dalke% time ./a.out < c.dat
15728647
3.585u 0.087s 0:03.72 98.3% 0+0k 0+0io 0pf+0w

Here's the code.

#include <iostream>
#include <stdio.h>
using namespace std;

const int BUFFER_SIZE=400000;
const int EXTRA=30;  // well over the size of an integer 

void read_to_newline(char *buffer) {
  int c;
  while (1) {
    c = getc_unlocked(stdin);
    if (c == '
' || c == EOF) {
      *buffer = '\0';
      return;
    }
    *buffer++ = c;
  }
} 

int main() {
  char buffer[BUFFER_SIZE+EXTRA];
  char *end_buffer;
  char *startptr, *endptr;

  //n is number of integers to perform calculation on
  //k is the divisor
  //inputnum is the number to be divided by k
  //total is the total number of inputnums divisible by k

  int n,k,inputnum,total,nbytes;

  //initialize total to zero
  total=0;

  //read in n and k from stdin
  read_to_newline(buffer);
  sscanf(buffer, "%i%i",&n,&k);

  while (1) {
    // Read a large block of values
    // There should be one integer per line, with nothing else.
    // This might truncate an integer!
    nbytes = fread(buffer, 1, BUFFER_SIZE, stdin);
    if (nbytes == 0) {
      cerr << "Reached end of file too early" << endl;
      break;
    }
    // Make sure I read to the next newline.
    read_to_newline(buffer+nbytes);

    startptr = buffer;
    while (n>0) {
      inputnum = 0;
      // I had used strtol but that was too slow
      //   inputnum = strtol(startptr, &endptr, 10);
      // Instead, parse the integers myself.
      endptr = startptr;
      while (*endptr >= '0') {
        inputnum = inputnum * 10 + *endptr - '0';
        endptr++;
      }
      // *endptr might be a '
' or '\0'

      // Might occur with the last field
      if (startptr == endptr) {
        break;
      }
      // skip the newline; go to the
      // first digit of the next number.
      if (*endptr == '
') {
        endptr++;
      }
      // Test if this is a factor
      if (inputnum % k==0) total += 1;

      // Advance to the next number
      startptr = endptr;

      // Reduce the count by one
      n--;
    }
    // Either we are done, or we need new data
    if (n==0) {
      break;
    }
  }

 // output value of total
 printf("%i
",total);
 return 0;
}

Oh, and it very much assumes the input data is in the right format.

Can I make this C++ code faster without making it much more complex?

Tags:

c++

performance

Problem:

Input:

Output:

Solution:

conorgriffin

2 Answers

wallyk

Andrew Dalke

Recent Activity

Donate For Us

Can I make this C++ code faster without making it much more complex?

Tags:

c++

performance

Problem:

Input:

Output:

Solution:

conorgriffin

2 Answers

wallyk

Andrew Dalke

Related questions

Recent Activity

Donate For Us