Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::getline alternative when input line endings are mixed

I'm trying to read in lines from a std::istream but the input may contain '\r' and/or '\n', so std::getline is no use.

Sorry to shout but this seems to need emphasis...

The input may contain either newline type or both.

Is there a standard way to do this? At the moment I'm trying

char c;
while (in >> c && '\n' != c && '\r' != c)
    out .push_back (c);

...but this skips over whitespace. D'oh! std::noskipws -- more fiddling required and now it's misehaving.

Surely there must be a better way?!?

like image 836
spraff Avatar asked Jul 14 '11 14:07

spraff


2 Answers

OK, here's one way to do it. Basically I've made an implementation of std::getline which accepts a predicate instead of a character. This gets you 2/3's of the way there:

template <class Ch, class Tr, class A, class Pred>
std::basic_istream<Ch, Tr> &getline(std::basic_istream<Ch, Tr> &is, std::basic_string<Ch, Tr, A>& str, Pred p) {

    typename std::string::size_type nread = 0;      
    if(typename std::istream::sentry(is, true)) {
        std::streambuf *sbuf = is.rdbuf();
        str.clear();

        while (nread < str.max_size()) {
            int c1 = sbuf->sbumpc();
            if (Tr::eq_int_type(c1, Tr::eof())) {
                is.setstate(std::istream::eofbit);
                break;
            } else {
                ++nread;
                const Ch ch = Tr::to_char_type(c1);
                if (!p(ch)) {
                    str.push_back(ch);
                } else {
                    break;
                }
            }
        }
    }

    if (nread == 0 || nread >= str.max_size()) {
        is.setstate(std::istream::failbit);
    }

    return is;
}

with a functor similar to this:

struct is_newline {
    bool operator()(char ch) const {
        return ch == '\n' || ch == '\r';
    }
};

Now, the only thing left is to determine if you ended on a '\r' or not..., if you did, then if the next character is a '\n', just consume it and ignore it.

EDIT: So to put this all into a functional solution, here's an example:

#include <string>
#include <sstream>
#include <iostream>

namespace util {

    struct is_newline { 
        bool operator()(char ch) {
            ch_ = ch;
            return ch_ == '\n' || ch_ == '\r';
        }

        char ch_;
    };

    template <class Ch, class Tr, class A, class Pred>
        std::basic_istream<Ch, Tr> &getline(std::basic_istream<Ch, Tr> &is, std::basic_string<Ch, Tr, A>& str, Pred &p) {

        typename std::string::size_type nread = 0;

        if(typename std::istream::sentry(is, true)) {
            std::streambuf *const sbuf = is.rdbuf();
                str.clear();

            while (nread < str.max_size()) {
                int c1 = sbuf->sbumpc();
                if (Tr::eq_int_type(c1, Tr::eof())) {
                    is.setstate(std::istream::eofbit);
                    break;
                } else {
                    ++nread;
                    const Ch ch = Tr::to_char_type(c1);
                    if (!p(ch)) {
                        str.push_back(ch);
                    } else {
                        break;
                    }
                }
            }
        }

        if (nread == 0 || nread >= str.max_size()) {
            is.setstate(std::istream::failbit);
        }

        return is;
    }
}

int main() {

    std::stringstream ss("this\ris a\ntest\r\nyay");
    std::string       item;
    util::is_newline  is_newline;

    while(util::getline(ss, item, is_newline)) {
        if(is_newline.ch_ == '\r' && ss.peek() == '\n') {
            ss.ignore(1);
        }

        std::cout << '[' << item << ']' << std::endl;
    }
}

I've made a couple minor changes to my original example. The Pred p parameter is now a reference so that the predicate can store some data (specifically the last char tested). And likewise I made the predicate operator() non-const so it can store that character.

The in main, I have a string in a std::stringstream which has all 3 versions of line breaks. I use my util::getline, and if the predicate object says that the last char was a '\r', then I peek() ahead and ignore 1 character if it happens to be '\n'.

like image 127
Evan Teran Avatar answered Oct 21 '22 16:10

Evan Teran


The usual way to read a line is with std::getline.

Edit: If your implementation of std::getline is broken, you could write something similar of your own, something like this:

std::istream &getline(std::istream &is, std::string &s) { 
    char ch;

    s.clear();

    while (is.get(ch) && ch != '\n' && ch != '\r')
        s += ch;
    return is;
}

I should add that technically this probably isn't a matter of std::getline being broken, as of the underlying stream implementation being broken -- it's up to the stream to translate from whatever characters signify the end of a line for the platform, into a newline character. Regardless of exactly which parts are broken, however, if your implementation is broken, this may be able to make up for it (then again, if your implementation is broken badly enough, it's hard to be sure this will work either).

like image 20
Jerry Coffin Avatar answered Oct 21 '22 16:10

Jerry Coffin