Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Copying one std stream to another efficiently

Tags:

c++

stream

Ok, Here's some code that outlines what I'm trying to do.

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/fcntl.h>

#include <iostream>
#include <sstream>

int main( int c, char *v[] )
{
    int fd = open( "data.out", O_RDONLY | O_NONBLOCK );
    std::cout << "fd = " << fd << std::endl;

    char buffer[ 1024000 ];
    ssize_t nread;

    std::stringstream ss;

    while( true )
    {
        if ( (nread = read( fd, buffer, sizeof( buffer ) - 1 )) < 0 )
            break;

        ss.write( buffer, nread );

        while( true )
        {
            std::stringstream s2;

            std::cout << "pre-get  : " <<
                (((ss.rdstate() & std::ios::badbit) == std::ios::badbit) ? "bad" : "") << " " <<
                (((ss.rdstate() & std::ios::eofbit) == std::ios::eofbit) ? "eof" : "") << " " <<
                (((ss.rdstate() & std::ios::failbit) == std::ios::failbit) ? "fail" : "" ) << " " <<
                std::endl;

            ss.get( *s2.rdbuf() );

            std::cout << "post-get : " <<
                (((ss.rdstate() & std::ios::badbit) == std::ios::badbit) ? "bad" : "") << " " <<
                (((ss.rdstate() & std::ios::eofbit) == std::ios::eofbit) ? "eof" : "") << " " <<
                (((ss.rdstate() & std::ios::failbit) == std::ios::failbit) ? "fail" : "" ) << " " <<
                std::endl;

            unsigned int linelen = ss.gcount() - 1;

            if ( ss.eof() )
            {
                ss.str( s2.str() );
                break;
            }
            else if ( ss.fail() )
            {
                ss.str( "" );
                break;
            }
            else
            {
                std::cout << s2.str() << std::endl;
            }
        }
    }
}

It firstly reads large chunks of data into a data buffer. I know there's better C++ ways of doing this part but in my real application I am handed a char[] buffer and a length.

I then write the buffer into a std::stringstream object so I can remove a line at a time from it.

I thought I'd use the get( streambuf & ) method on the stringstream to write one line to another stringstream where I can then output it.

Ignoring the fact that this may not be the best way to extract a line at a time from the buffer I've read in (although I'd like anyone to offer up a better alternative to the one I post here), as soon as the first ss.get( *s2.rdbuf() ) is called the ss is in a fail state and I can't work out why. There's plenty of data in the input file so ss should definately contain more than one line of input.

Any ideas?

like image 560
ScaryAardvark Avatar asked Jan 18 '10 08:01

ScaryAardvark


1 Answers

It seems to me that the first (and probably biggest) step to get decent efficiency is to minimize copying the data. Since you're being given the data in a char[] with a length, my first tendency would be to start by creating a strstream using that buffer. Then instead of copying a string at a time to another strstream (or stringstream) I'd copy strings one at a time to the stream you'll use to write them to the output.

If you're allowed to modify the contents of the buffer, another possibility would be to parse buffer into lines by simply replacing each '\n' with a '\0'. If you're going to do that, you'll usually want to create a vector (deque, etc.) of pointers to the beginning of each line as well (i.e. find the first '\r' or '\n', and replace it with a '\0'. Then, the next thing other than a '\r' or '\n' is the beginning of the next line, so its address in your vector).

I'd also think hard about whether you can avoid the line-at-a-time output. Reading through a large buffer to find newline's is relatively slow. If you're going to end up writing one line after another anyway, you could avoid all this by just writing to the whole buffer to the output stream and being done with it.

like image 83
Jerry Coffin Avatar answered Oct 13 '22 00:10

Jerry Coffin