Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster than scanf?

I was doing massive parsing of positive integers using scanf("%d", &someint). As I wanted to see if scanf was a bottleneck, I implemented a naive integer parsing function using fread, just like:

int result;
char c;

while (fread(&c, sizeof c, 1, stdin), c == ' ' || c == '\n')
    ;

result = c - '0';
while (fread(&c, sizeof c, 1, stdin), c >= '0' || c <= '9') {
     result *= 10;
     result += c - '0';
}

return result;

But to my astonishment, the performance of this function is (even with inlining) about 50% worse. Shouldn't there be a possibility to improve on scanf for specialized cases? Isn't fread supposed to be fast (additional hint: Integers are (edit: mostly) 1 or 2 digits)?

like image 270
Jo So Avatar asked Dec 12 '11 23:12

Jo So


People also ask

What is better than scanf?

The most common ways of reading input are: using fgets with a fixed size, which is what is usually suggested, and. using fgetc , which may be useful if you're only reading a single char .

Is Getchar faster than scanf?

It is a known fact that scanf() is faster than cin and getchar() is faster than scanf() in general.

Is CIN or scanf faster?

With synchronization turned off, the above results indicate that cin is 8-10% faster than scanf(). This is probably because scanf() interprets the format arguments at runtime and makes use of variable number of arguments, whereas cin does this at compile time.

Is Strcmp faster than ==?

While comparing the two function, in a loop repeted 500'000'000 times, strcmp execute too much fast, about x750 times faster. The execution time of that function is 3.058s , while strcmp only 0.004s .


1 Answers

The overhead I was encountering was not the parsing itself but the many calls to fread (same with fgetc, and friends). For each call, the libc has to lock the input stream to make sure two threads aren't stepping on each other's feet. Locking is a very costly operation.

What we're looking for is a function that gives us buffered input (reimplementing buffering is just too much effort) but avoids the huge locking overhead of fgetc.

If we can guarantee that there is only a single thread using the input stream, we can use the functions from unlocked_stdio(3), such as getchar_unlocked(3). Here is an example:

static int parseint(void)
{
    int c, n;

    n = getchar_unlocked() - '0';
    while (isdigit((c = getchar_unlocked())))
        n = 10*n + c-'0';

    return n;
}

The above version doesn't check for errors. But it's guaranteed to terminate. If error handling is needed it might be enough to check feof(stdin) and ferror(stdin) at the end, or let the caller do it.

My original purpose was submitting solutions to programming problems at SPOJ, where the input is only whitespace and digits. So there is still room for improvement, namely the isdigit check.

static int parseint(void)
{
    int c, n;

    n = getchar_unlocked() - '0';
    while ((c = getchar_unlocked()) >= '0')
        n = 10*n + c-'0';

    return n;
}

It's very, very hard to beat this parsing routine, both performance-wise and in terms of convenience and maintainability.

like image 166
Jo So Avatar answered Sep 19 '22 22:09

Jo So