Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I replace vector<uint8_t>::const_iterator in an API?

I've been given the task of polishing the interface of a codec library. We're using C++17, and I can only use the standard library (i.e. no Boost). Currently, there's a Decoder class that looks roughly like this:

class Decoder : public Codec {

public:

    struct Result {
        vector<uint8_t>::const_iterator new_buffer_begin;
        optional<Metadata>              metadata;
        optional<Packet>                packet;
    };

    Result decode(vector<uint8_t>::const_iterator buffer_begin,
                  vector<uint8_t>::const_iterator buffer_end);

private:
    // irrelevant details
};

The caller instantiates a Decoder, then feeds a stream of data to the decoder by

  1. Reading a chunk of data from a file (but there could be other sources in the future), and appending it to a vector<uint8_t>.

  2. Calling the decode function, passing the iterators for their vector.

  3. If the returned Result's new_buffer_begin is identical to the buffer_begin that was passed to decode, that means there wasn't enough data in the buffer to decode anything, and the caller should go back to step 1. Otherwise, the caller consumes the Metadata or Packet object that was decoded, and goes back to step 2, using new_buffer_begin for the next pass.

The things I dislike about this interface and need help improving:

  • Using vector<uint8_t>::const_iterator seems overly specific. Is there a more generic approach that doesn't force the caller to use vector? I was considering just using C-style interface; a uint8_t * and a length. Is there a C++ alternative that's fairly generic?

  • If there was enough data to decode something, only metadata or packet will have a value. I think std::variant or 2 callbacks (one for each type) would make this code more self-documenting. I'm not sure which is more idiomatic though. What are the pros and cons of each, and is there an even better approach?

like image 551
splicer Avatar asked Apr 13 '19 22:04

splicer


3 Answers

I agree that mandating vector is inappropriate, and applaud your attempts to make the interface more useful.

If decode expects a contiguous sequence of uint8_t, the tried-and-tested (and most flexible) solution is just to take a const uint8_t* and a std::size_t (or alternatively two pointers, but pointer and length is more idiomatic).

From C++20 you can do this with one argument of type std::span<const uint8_t>. Or going back to pointers, if you really want to use modern library tools for the sake of it, you can confuse people with std::experimental::observer_ptr.

You may also consider making decode a template that accepts any iterator pair, and (if contiguity is needed) mandates, even if only by documentation, that the iterators reflect a contiguous sequence. But making everything a template isn't always what you want, and it isn't always useful.

like image 147
Lightness Races in Orbit Avatar answered Oct 24 '22 09:10

Lightness Races in Orbit


In addition to @Justin's valid suggestion of spans:

  • You might also want to consider using std::byte instead of uint8_t, so:
    Result decode(std::span<const std::byte> buffer);
    
    or if you're in C++17, use the span implementation from the C++ Guidelines Support library:
    #include <gsl/span>
    // etc.
    Result decode(gsl::span<const std::byte> buffer);
    
  • If you want to support decoding from containers other than raw memory, use arbitrary iterators (in C++17 and earlier) or possibly ranges (in C++20). The iterator version:

    template <typename InputIt>
    Result decode(InputIt start, InputIt end) { /* etc. */ }
    
  • It's fishy that a Decoder inherits from a Codec rather than the other way around.

  • The question of whether callbacks are a good choice or not is something that's difficult (for me) to answer without seeing the code. But do indeed use an std::variant to express the fact you have either a Packet or Metadata; you could also "combine" alternatives if instead of callbacks you use variants' std::visit.
like image 15
einpoklum Avatar answered Oct 24 '22 09:10

einpoklum


C++20 will have std::span, which does what you want:

    Result decode(std::span<uint8_t const> buffer);

std::span<T> is semantically equivalent to a T* buffer, size_t size.


In C++17, there are some implementations of a span type which are equivalent to std::span, such as the GSL's gsl::span. See What is a "span" and when should I use one? .

If you can't use any external libraries, consider writing your own span type, else uint8_t const* buffer_begin, uint8_t const* buffer_end can work.

like image 5
Justin Avatar answered Oct 24 '22 09:10

Justin