I have a vector 'a' which contains huge amount of data and should be split into two seperate vectors 'b' and 'c'.
vector<unsigned char> a; //contains a lot of data
vector<unsigned char> b; //data should be split into b and c
vector<unsigned char> c;
The layout of the data in vector 'a' is as follows:
bbbbccccbbbbccccbbbbcccc
The first 4 bytes should go into vector 'b', the next 4 bytes into vector 'c', etc..
I could iterate through my data and push_back (or insert) every element into the corresponding vector (based on the index they have in vector 'a'). However, I tried this and the result was very slow.
Is there a more performant way in C++ to achieve this?
Use the split() function in R to split a vector or data frame. Use the unsplit() method to retrieve the split vector or data frame.
Try to pre-allocate the memory that you are going to use to avoid copies. Assuming a
contains full sequences, you can do:
b.reserve(a.size() / 2);
c.reserve(a.size() / 2);
for (auto it = a.begin(); it < a.end(); it += 8) {
b.insert(b.end(), it, it + 4);
c.insert(c.end(), it + 4, it + 8);
}
Update
If you don't mind modifying the original vector a
, you can use it to keep one of the subsequences and avoid allocating more memory. Assuming a
contains full sequences:
b.reserve(a.size() / 2);
auto writer = a.begin();
for (auto reader = a.cbegin(); reader < a.cend(); reader += 8, writer += 4) {
b.insert(b.end(), reader, reader + 4);
std::copy(reader + 4, reader + 8, writer);
}
a.resize(a.size() / 2);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With