Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert std::vector<unsigned char> to vector<char> without copying?

Tags:

c++

std

vector

I weren't able to find that question, and it's an actual problem I'm facing.

I have a file loading utility that returns std::vector<unsigned char> containing whole file contents. However, the processing function requires contiguos array of char (and that cannot be changed - it's a library function). Since the class that's using the processing function stores a copy of the data anyway, I want to store it as vector<char>. Here's the code that might be a bit more illustrative.

std::vector<unsigned char> LoadFile (std::string const& path);

class Processor {
    std::vector<char> cache;
    void _dataOperation(std::vector<char> const& data);

public:
    void Process() {
        if (cache.empty())
            // here's the problem!
            cache = LoadFile("file.txt");

        _dataOperation(cache);
    }
};

This code doesn't compile, because (obviously) there's no appropriate conversion. We can be sure, however, that the temporary vector will ocupy the same amount of memory (IOW sizeof(char) == sizeof(unsigned char))

The naive solution would be to iterate over the contents of a temporary and cast every character. I know that in normal case, the operator= (T&&) would be called.

In my situation it's safe to do reinterpreting conversion, because I am sure I am going to read ASCII characters only. Any other character would be caught in _dataOperation anyway.

So, my question is : how to properly and safely convert the temporary vector in a way that involves no copying?

If it isn't possible, I would prefer the safe way of copying rather than unsafe noncopying. I could also change LoadFile to return either vector<char> or vector<unsigned char>.

like image 577
Bartek Banachewicz Avatar asked Feb 06 '13 00:02

Bartek Banachewicz


People also ask

How do you create a char vector in CPP?

char test [] = { 'a', 'b', 'c', 'd', 'e' }; vector<char[]> v; v. push_back(test);

Should I use std for vector?

If you need a "dynamic" array, then std::vector is the natural solution. It should in general be the default container for everything. But if you want a statically sized array created at time of compilation (like a C-style array is) but wrapped in a nice C++ object then std::array might be a better choice.

Does STD Vector clear deallocate?

clear() don't release or reallocate allocated memory, they just resize vector to zero size, leaving capacity same. If we need to clear with freeing (releasing) memory following works: Try it online!


1 Answers

In C++11, [basic.lval]p10 says,

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • ...
  • a char or unsigned char type.

(the exact location may be different in other versions of C++, but the meaning is the same.)

That means that you can take a vector<unsigned char> cache and access its contents using the range [reinterpret_cast<char*>(cache.data()), reinterpret_cast<char*>(cache.data()) + cache.size()). (@Kerrek SB mentioned this.)

If you store a vector<unsigned char> in Processor to match the return type of LoadFile, and _dataOperation() actually takes an array of char (meaning a const char* and a size), then you can cast when you're passing the argument to _dataOperation()

However, if _dataOperation() takes a vector<char> specifically and you store a vector<unsigned char> cache, then you cannot pass it reinterpret_cast<vector<char>&>(cache). (i.e. @André Puel is totally wrong. Do not listen to him.) That violates the aliasing rules, and the compiler will attempt to anger your customers at 2am. (And if this version of your compiler doesn't manage it, the next version will keep trying.)

One option is, as you mentioned, to template LoadFile() and have it return (or fill in) a vector of the type you want. Another is to copy the result, for which the concise version is again the reinterpret_cast of the source vector's .data(). [basic.fundamental]p1 mentions that "For character types, all bits of the object representation participate in the value representation.", meaning that you're not going to lose data with that reinterpret_cast. I don't see a firm guarantee that no bit pattern of an unsigned char can cause a trap if reinterpret_cast'ed to char, but I don't know of any modern hardware or compilers that do it.

like image 134
Jeffrey Yasskin Avatar answered Oct 07 '22 11:10

Jeffrey Yasskin