Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get a sorted subvector out of a sorted vector, fast

I have a data structure like this:

struct X {
  float value;
  int id;
};

a vector of those (size N (think 100000), sorted by value (stays constant during the execution of the program):

std::vector<X> values;

Now, I want to write a function

void subvector(std::vector<X> const& values, 
               std::vector<int> const& ids, 
               std::vector<X>& out /*, 
               helper data here */);

that fills the out parameter with a sorted subset of values, given by the passed ids (size M < N (about 0.8 times N)), fast (memory is not an issue, and this will be done repeatedly, so building lookuptables (the helper data from the function parameters) or something else that is done only once is entirely ok).

My solution so far:
Build lookuptable lut containing id -> offset in values (preparation, so constant runtime)
create std::vector<X> tmp, size N, filled with invalid ids (linear in N)
for each id, copy values[lut[id]] to tmp[lut[id]] (linear in M)
loop over tmp, copying items to out (linear in N)

this is linear in N (as it's bigger than M), but the temporary variable and repeated copying bugs me. Is there a way to do it quicker than this? Note that M will be close to N, so things that are O(M log N) are unfavourable.

Edit: http://ideone.com/xR8Vp is a sample implementation of mentioned algorithm, to make the desired output clear and prove that it's doable in linear time - the question is about the possibility of avoiding the temporary variable or speeding it up in some other way, something that is not linear is not faster :).

like image 861
etarion Avatar asked Nov 29 '10 22:11

etarion


1 Answers

An alternative approach you could try is to use a hash table instead of a vector to look up ids in:

void subvector(std::vector<X> const& values, 
               std::unordered_set<int> const& ids, 
               std::vector<X>& out) {

    out.clear();
    out.reserve(ids.size());
    for(std::vector<X>::const_iterator i = values.begin(); i != values.end(); ++i) {
        if(ids.find(i->id) != ids.end()) {
            out.push_back(*i);
        }
    }
}

This runs in linear time since unordered_set::find is constant expected time (assuming that we have no problems hashing ints). However I suspect it might not be as fast in practice as the approach you described initially using vectors.

like image 81
Peter Avatar answered Nov 09 '22 23:11

Peter