Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"carbon-copy" a c++ istream?

For my very own little parser framework, I am trying to define (something like) the following function:

template <class T>
// with operator>>( std::istream&, T& )
void tryParse( std::istream& is, T& tgt )
{
    is >> tgt /* , *BUT* store every character that is consumed by this operation
    in some string. If afterwards, is.fail() (which should indicate a parsing
    error for now), put all the characters read back into the 'is' stream so that
    we can try a different parser. */
}

Then I could write something like this: (maybe not the best example)

/* grammar: MyData     = <IntTriple> | <DoublePair>
            DoublePair = <double> <double>
            IntTriple  = <int> <int> <int> */
class MyData
{ public:
    union { DoublePair dp; IntTriple it; } data;
    bool isDoublePair;
};

istream& operator>>( istream& is, MyData& md )
{
    /* If I used just "is >> md.data.it" here instead, the
       operator>>( ..., IntTriple ) might consume two ints, then hit an
       unexpected character, and fail, making it impossible to read these two
       numbers as doubles in the "else" branch below. */
    tryParse( is, md.data.it );
    if ( !is.fail() )
        md.isDoublePair = false;
    else
    {
        md.isDoublePair = true;
        is.clear();
        is >> md.data.dp;
    }
    return is;
}

Any help is greatly appreciated.

like image 775
srs Avatar asked Sep 23 '10 12:09

srs


3 Answers

Unfortunately, streams have only very minimal and rudimentary putback support.

The last times I needed this, I wrote my own reader classes which wrapped a stream, but had a buffer to put things back into, and read from the stream only when that buffer is empty. These had ways to get a state from, and you could commit a state or rollback to an earlier state.
The default action in the state class' destructor was to rollback, so that you could parse ahead without giving much thought to error handling, because an exception would simply rollback the parser's state up to a point where a different grammar rule was tried. (I think this is called backtracking.) Here's a sketch:

class parse_buffer {
    friend class parse_state;
public:
    typedef std::string::size_type index_type;

    parse_buffer(std::istream& str);

    index_type get_current_index() const;
    void set_current_index(index_type) const;

    std::string get_next_string(bool skip_ws = true) const;
    char get_next_char(bool skip_ws = true);
    char peek_next_char(bool skip_ws = true); 

    std::string get_error_string() const; // returns string starting at error idx
    index_type get_error_index() const;
    void set_error_index(index_type);

    bool eof() const;

    // ...
};

class parse_state {
public:
    parse_state(parse_buffer&);
    ~parse_state();

    void commit();
    void rollback();

    // ...
};

This should give you an idea. It has none of the implementation, but that was straightforward and should be easy to redo. Also, the real code had many convenient functions like reading functions that read a delimited string, consumed a string if it was one of several given keywords, read a string and converted it to a type given per template parameter, and stuff like this.

The idea was that a function would set the error index to its starting position, save the parse state, and try to parse until it either succeeded or ran into a dead end. In the latter case, it would just throw an exception. This would destroy the parse_state objects on the stack, rolling back the state up to a function which could catch the exception and either try something else, or output an error (which is where get_error_string() comes in.)

If you want a really fast parser, this strategy might be wrong, but then streams are often to slow, too. OTOH, the last time I used something like this, I made an XPath parser that operates on a proprietary DOM, which is used to represent scenes in a 3D renderer. And it was not the XPath parser that got all the heat from the guys trying to get higher frame rates. :)

like image 115
sbi Avatar answered Oct 15 '22 05:10

sbi


This is not what streams are intended for. You should read the data you want to parse into a buffer and then hand that buffer (preferably as an iterator-range) to the functions that parse it. This could look something like this:

template <class T, class U>
bool tryParse( U & begin, U & end, T & target ) {
    // return true if parse was successful, false otherwise
}

To read from an istream into a buffer, you can use an istream_iterator:

 std::vector< char > buffer(std::istream_iterator<char>(is), std::istream_iterator<char>());

This reads the entire stream into the vector when it is created.

like image 45
Björn Pollex Avatar answered Oct 15 '22 05:10

Björn Pollex


Putting the characters back is tricky. Some streams support unget() and putback(somechar), but there is no guarantee how many characters you can unget (if any).

A more reliable way is to read the characters into a buffer and parse that, or store the characters read in the first parsing attempt and use that buffer when parsing a second time.

like image 44
Anthony Williams Avatar answered Oct 15 '22 05:10

Anthony Williams