Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a vector by a pattern

Tags:

c++

c++11

c++14

I have a vector 'a' which contains huge amount of data and should be split into two seperate vectors 'b' and 'c'.

vector<unsigned char> a; //contains a lot of data

vector<unsigned char> b; //data should be split into b and c
vector<unsigned char> c;

The layout of the data in vector 'a' is as follows:

bbbbccccbbbbccccbbbbcccc

The first 4 bytes should go into vector 'b', the next 4 bytes into vector 'c', etc..

I could iterate through my data and push_back (or insert) every element into the corresponding vector (based on the index they have in vector 'a'). However, I tried this and the result was very slow.

Is there a more performant way in C++ to achieve this?

like image 287
user3067395 Avatar asked Sep 14 '15 15:09

user3067395


People also ask

How do I split a vector in R?

Use the split() function in R to split a vector or data frame. Use the unsplit() method to retrieve the split vector or data frame.


1 Answers

Try to pre-allocate the memory that you are going to use to avoid copies. Assuming a contains full sequences, you can do:

b.reserve(a.size() / 2);
c.reserve(a.size() / 2);
for (auto it = a.begin(); it < a.end(); it += 8) {
  b.insert(b.end(), it, it + 4);
  c.insert(c.end(), it + 4, it + 8);
}

Update

If you don't mind modifying the original vector a, you can use it to keep one of the subsequences and avoid allocating more memory. Assuming a contains full sequences:

b.reserve(a.size() / 2);
auto writer = a.begin();
for (auto reader = a.cbegin(); reader < a.cend(); reader += 8, writer += 4) {
  b.insert(b.end(), reader, reader + 4);
  std::copy(reader + 4, reader + 8, writer);
}
a.resize(a.size() / 2);
like image 108
ChronoTrigger Avatar answered Oct 06 '22 01:10

ChronoTrigger