Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting, then passing raw data into another class - How to avoid copying twice while maintaining encapsulation?

Consider a class Book with a stl container of class Page. each Page holds a screenshot, like page10.jpg in raw vector<char> form.

A Book is opened with a path to a zip, rar, or directory containing these screenshots, and uses respective methods of extracting the raw data, like ifstream inFile.read(buffer, size);, or unzReadCurrentFile(zipFile, buffer, size). It then calls the Page(const char* stream, int filesize) constructor.

Right now, it's clear that the raw data is being copied twice. Once to extract to Book's local buffer and a second time in the Page ctor to the Page::vector<char>. Is there a way to maintain encapsulation while getting rid of the middleman buffer?

like image 429
Kache Avatar asked Mar 25 '10 00:03

Kache


1 Answers

In terms of code changes based on what you have already, the simplest is probably to give Page a setter taking a non-const vector reference or pointer, and swap it with the vector contained in the Page. The caller will be left holding an empty vector, but since the problem is excessive copying, presumably the caller doesn't want to keep the data:

void Book::addPage(ifstream file, streampos size) {
    std::vector<char> vec(size);
    file.read(&vec[0], size);
    pages.push_back(Page()); // pages is a data member
    pages.back().setContent(vec);
}

class Page {
    std::vector<char> content;
public:
    Page() : content(0) {} // create an empty page
    void setContent(std::vector<char> &newcontent) {
        content.swap(newcontent);
    }
};

Some people (for example the Google C++ style guide) want reference parameters to be const, and would want you to pass the newcontent parameter as a pointer, to emphasise that it is non-const:

void setContent(std::vector<char> *newcontent) {
    content.swap(*newcontent);
}

swap is fast - you'd expect it just to exchange the buffer pointers and sizes of the two vector objects.

Alternatively, give Page two different constructors: one for the zip file and one for the regular file, and have it be responsible for reading its own data. This is probably the cleanest, and it allows Page to be immutable, rather than being modified after construction. But actually you might not want that, since as you've noticed in a comment, adding the Page to a container copies the Page. So there's some benefit to being able to modify the Page to add the data after it has been cheaply constructed in the container: it avoids that extra copy without you needing to mess with containers of pointers. Still, the setContent function could just as easily take the file stream / zip file info as take a vector.

You could find or write a stream class which reads from a zipfile, so that Page can be responsible for reading data with just one constructor taking a stream. Or perhaps not a whole stream class, perhaps just an interface you design which reads data from a stream/zip/rar into a specified buffer, and Page can specify its internal vector as the buffer.

Finally, you could "mess with containers of pointers". Make pages a std::vector<boost::shared_ptr<Page> >, then do:

void Book::addPage(ifstream file, streampos size) {
    boost::shared_ptr<Page> page(new Page(file, size));
    pages.push_back(page); // pages is a data member
}

A shared_ptr has a modest overhead relative to just a Page (it makes an extra memory allocation for a small node containing a pointer and a refcount), but is much cheaper to copy. It's in TR1 too, if you have some implementation of that other than Boost.

like image 80
Steve Jessop Avatar answered Sep 24 '22 14:09

Steve Jessop