Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to read millions of integers from stdin C++?

Tags:

c++

file

io

sorting

I am working on a sorting project and I've come to the point where a main bottleneck is reading in the data. It takes my program about 20 seconds to sort 100,000,000 integers read in from stdin using cin and std::ios::sync_with_stdio(false); but it turns out that 10 of those seconds is reading in the data to sort. We do know how many integers we will be reading in (the count is at the top of the file we need to sort).

How can I make this faster? I know it's possible because a student in a previous semester was able to do counting sort in a little over 3 seconds (and that's basically purely read time).

The program is just fed the contents of a file with integers separated by newlines like $ ./program < numstosort.txt

Thanks

Here is the relevant code:

    std::ios::sync_with_stdio(false);
    int max;
    cin >> max;
    short num;
    short* a = new short[max];
    int n = 0;
    while(cin >> num) { 
        a[n] = num;
        n++;
    }
like image 865
Snow Sailor Avatar asked Nov 07 '16 19:11

Snow Sailor


2 Answers

This will get your data into memory about as fast as possible, assuming Linux/POSIX running on commodity hardware. Note that since you apparently aren't allowed to use compiler optimizations, C++ IO is not going to be the fastest way to read data. As others have noted, without optimizations the C++ code will not run anywhere near as fast as it can.

Given that the redirected file is already open as stdin/STDIN_FILENO, use low-level system call/C-style IO. That won't need to be optimized, as it will run just about as fast as possible:

struct stat sb;
int rc = ::fstat( STDIN_FILENO, &sb );

// use C-style calloc() to get memory that's been
// set to zero as calloc() is often optimized to be
// faster than a new followed by a memset().
char *data = (char *)::calloc( 1, sb.st_size + 1 );
size_t totalRead = 0UL;
while ( totalRead  < sb.st_size )
{
    ssize_t bytesRead = ::read( STDIN_FILENO,
        data + totalRead, sb.st_size - totalRead );
    if ( bytesRead <= 0 )
    {
        break;
    }
    totalRead += bytesRead;
}

// data is now in memory - start processing it

That code will read your data into memory as one long C-style string. And the lack of compiler optimizations won't matter one bit as it's all almost bare-metal system calls.

Using fstat() to get the file size allows allocating all the needed memory at once - no realloc() or copying data around is necessary.

You'll need to add some error checking, and a more robust version of the code would check to be sure the data returned from fstat() actually is a regular file with an actual size, and not a "useless use of cat" such as cat filename | YourProgram, because in that case the fstat() call won't return a useful file size. You'll need to examine the sb.st_mode field of the struct stat after the call to see what the stdin stream really is:

::fstat( STDIN_FILENO, &sb );
...
if ( S_ISREG( sb.st_mode ) )
{
    // regular file...
}

(And for really high-performance systems, it can be important to ensure that the memory pages you're reading data into are actually mapped in your process address space. Performance can really stall if data arrives faster than the kernel's memory management system can create virtual-to-physical mappings for the pages data is getting dumped into.)

To handle a large file as fast as possible, you'd want to go multithreaded, with one thread reading data and feeding one or more data processing threads so you can start processing data before you're done reading it.

Edit: parsing the data.

Again, preventing compiler optimizations probably makes the overhead of C++ operations slower than C-style processing. Based on that assumption, something simple will probably run faster.

This would probably work a lot faster in a non-optimized binary, assuming the data is in a C-style string read in as above:

char *next;
long count = ::strtol( data, &next, 0 );
long *values = new long[ count ];

for ( long ii = 0; ii < count; ii++ )
{
    values[ ii ] = ::strtol( next, &next, 0 );
}

That is also very fragile. It relies on strtol() skipping over leading whitespace, meaning if there's anything other than whitespace between the numeric values it will fail. It also relies on the initial count of values being correct. Again - that code will fail if that's not true. And because it can replace the value of next before checking for errors, if it ever goes off the rails because of bad data it'll be hopelessly lost.

But it should be about as fast as possible without allowing compiler optimizations.

That's what crazy about not allowing compiler optimizations. You can write simple, robust C++ code to do all your processing, make use of a good optimizing compiler, and probably run almost as fast as the code I posted - which has no error checking and will fail spectacularly in unexpected and undefined ways if fed unexpected data.

like image 138
Andrew Henle Avatar answered Sep 20 '22 22:09

Andrew Henle


You can make it faster if you use a SolidState hard drive. If you want to ask something about code performance, you need to post how are you doing things in the first place.

like image 35
Ricardo Ortega Magaña Avatar answered Sep 18 '22 22:09

Ricardo Ortega Magaña