Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast I/O in c, stdin/out

In a coding competition specified at this link there is a task where you need to read much data on stdin, do some calculations and present a whole lot of data on stdout.

In my benchmarking it is almost only i/o that takes time although I have tried optimizing it as much as possible.

What you have as input is a string (1 <= len <= 100'000) and q rows of pair of int where q also is 1 <= q <= 100'000.

I benchmarked my code on a 100 times larger dataset (len = 10M, q = 10M) and this is the result:

 Activity            time      accumulated

 Read text:          0.004     0.004
 Read numbers:       0.146     0.150
 Parse numbers:      0.200     0.350
 Calc answers:       0.001     0.351
 Format output:      0.037     0.388
 Print output:       0.143     0.531

By implementing my own formating and number parsing inline i managed to get the time down to 1/3 of the time when using printf and scanf.

However when I uploaded my solution to the competitions webpage my solution took 1.88 seconds (I think that is the total time over 22 datasets). When I look in the high-score there are several implementations (in c++) that finished in 0.05 seconds, nearly 40 times faster than mine! How is that possible?

I guess that I could speed it up a bit by using 2 threads, then I can start calculating and writing to stdout while still reading from stdin. This will however decrease the time to min(0.150, 0.143) in a theoretical best case on my large dataset. I'm still nowhere close to the highscore..

In the image below you can see the statistics of the consumed time.

Statistics of the consumed time

The program gets compiled by the website with this options:

gcc -g -O2 -std=gnu99 -static my_file.c -lm

and timed like this:

time ./a.out < sample.in > sample.out

My code looks like this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_LEN (100000 + 1)
#define ROW_LEN (6 + 1)
#define DOUBLE_ROW_LEN (2*ROW_LEN)

int main(int argc, char *argv[])
{
    int ret = 1;

    // Set custom buffers for stdin and out
    char stdout_buf[16384];
    setvbuf(stdout, stdout_buf, _IOFBF, 16384);
    char stdin_buf[16384];
    setvbuf(stdin, stdin_buf, _IOFBF, 16384);

    // Read stdin to buffer
    char *buf = malloc(MAX_LEN);
    if (!buf) {
        printf("Failed to allocate buffer");
        return 1;
    }
    if (!fgets(buf, MAX_LEN, stdin))
        goto EXIT_A;

    // Get the num tests
    int m ;
    scanf("%d\n", &m);

    char *num_buf = malloc(DOUBLE_ROW_LEN);
    if (!num_buf) {
        printf("Failed to allocate num_buffer");
        goto EXIT_A;
    }

    int *nn;
    int *start = calloc(m, sizeof(int));
    int *stop = calloc(m, sizeof(int));
    int *staptr = start; 
    int *stpptr = stop;
    char *cptr;
    for(int i=0; i<m; i++) {
        fgets(num_buf, DOUBLE_ROW_LEN, stdin);
        nn = staptr++;
        cptr = num_buf-1;
        while(*(++cptr) > '\n') {
            if (*cptr == ' ')
                nn = stpptr++;
            else
                *nn = *nn*10 + *cptr-'0';
        }
    }


    // Count for each test
    char *buf_end = strchr(buf, '\0');
    int len, shift;
    char outbuf[ROW_LEN];
    char *ptr_l, *ptr_r, *out;
    for(int i=0; i<m; i++) {
        ptr_l = buf + start[i];
        ptr_r = buf + stop[i];
        while(ptr_r < buf_end && *ptr_l == *ptr_r) {
            ++ptr_l;
            ++ptr_r;
        }

        // Print length of same sequence
        shift = len = (int)(ptr_l - (buf + start[i]));
        out = outbuf;
        do {
            out++;
            shift /= 10;
        } while (shift);
        *out = '\0';
        do {
            *(--out) = "0123456789"[len%10];
            len /= 10;
        } while(len);
        puts(outbuf);
    }



    ret = 0;

    free(start);
    free(stop);
EXIT_A:
    free(buf);
    return ret;
}
like image 539
Daniel Falk Avatar asked Apr 23 '17 15:04

Daniel Falk


People also ask

What is fast input output method in C?

It is often recommended to use scanf/printf instead of cin/cout for fast input and output. However, you can still use cin/cout and achieve the same speed as scanf/printf by including the following two lines in your main() function: ios_base::sync_with_stdio(false);

What is Ios_base :: Sync_with_stdio false );?

ios_base::sync_with_stdio(false); This disables the synchronization between the C and C++ standard streams. By default, all standard streams are synchronized, which in practice allows you to mix C- and C++-style I/O and get sensible and expected results.

What is fast input method?

This method reduces the input and output time of our C++ program and makes it more optimised. This method helps us to save time in programming contests. This method is widely used by most C++ Competitive programmers around the world. This technique is known as Fast I/O or Fast Input/Output.

Why is CPP so fast?

C++ performance. In contrast, a program written in C++ gets compiled directly into machine code -- without an intermediary translation required at runtime. This is one reason why C++ programs tend to perform faster than those written in Java.


1 Answers

Thanks to your question, I went and solved the problem myself. Your time is better than mine, but I'm still using some stdio functions.

I simply do not think the high score of 0.05 seconds is bona fide. I suspect it's the product of a highly automated system that returned that result in error, and that no one ever verified it.

How to defend that assertion? There's no real algorithmic complexity: the problem is O(n). The "trick" is to write specialized parsers for each aspect of the input (and avoid work done only in debug mode). The total time for 22 trials is 50 milliseconds, meaning each trial averages 2.25 ms? We're down near the threshold of measurability.

Competitions like the problem you addressed yourself to are unfortunate, in a way. They reinforce the naive idea that performance is the ultimate measure of a program (there's no score for clarity). Worse, they encourage going around things like scanf "for performance" while, in real life, getting a program to run correctly and fast basically never entails avoiding or even tuning stdio. In a complex system, performance comes from things like avoiding I/O, passing over the data only once, and minimizing copies. Using the DBMS effectively is often key (as it were), but such things never show up in programming challenges.

Parsing and formatting numbers as text does take time, and in rare circumstances can be a bottleneck. But the answer is hardly ever to rewrite the parser. Rather, the answer is to parse the text into a convenient binary form, and use that. In short: compilation.

That said, a few observations may help.

You don't need dynamic memory for this problem, and it's not helping. The problem statement says the input array may be up to 100,000 elements, and the number of trials may be as many as 100,000. Each trial is two integer strings of up to 6 digits each separated by a space and terminated by a newline: 6 + 1 + 6 + 1 = 14. Total input, maximum is 100,000 + 1 + 6 + 1 + 100,000 * 14: under 16 KB. You are allowed 1 GB of memory.

I just allocated a single 16 KB buffer, and read it in all at once with read(2). Then I made a single pass over that input.

You got suggestions to use asynchronous I/O and threads. The problem statement says you're measured on CPU time, so neither of those help. The shortest distance between two points is a straight line; a single read into statically allocated memory wastes no motion.

One ridiculous aspect of the way they measure performance is that they use gcc -g. That means assert(3) is invoked in code that is measured for performance! I couldn't get under 4 seconds on test 22 until I removed the my asserts.

In sum, you did pretty well, and I suspect the winner you're baffled by is a phantom. Your code does faff about a bit, and you can dispense with dynamic memory and tuning stdio. I bet your time can be trimmed by simplifying it. To the extent that performance matters, that's where I'd direct your attention.

like image 97
James K. Lowden Avatar answered Sep 29 '22 10:09

James K. Lowden