C++ How to merge sorted vectors into a sorted vector / pop the least element from all of them?

Tags:

I have a collection of about a hundred or so sorted vector<int>'s Although most vectors have a small number of integers in them, some of the vectors contain a large (>10K) of them (thus the vectors don't necessarily have the same size).

What I'd like to do essentially iterate through smallest to largest integer, that are contained in all these sorted vectors.

One way to do it would be to merge all these sorted vectors into a sorted vector & simply iterate. Thus,

Question 1: What is the fastest way to merge sorted vectors into a sorted vector?

I'm sure on the other hand there are faster / clever ways to accomplish this without merging & re-sorting the whole thing -- perhaps popping the smallest integer iteratively from this collection of sorted vectors; without merging them first.. so:

Question 2: What is the fasted / best way to pop the least element from a bunch of sorted vector<int>'s?

Based on replies below, and the comments to the question I've implemented an approach where I make a priority queue of iterators for the sorted vectors. I'm not sure if this is performance-efficient, but it seems to be very memory-efficient. I consider the question still open, since I'm not sure we've established the fastest way yet.

// compare vector pointers by integers pointed
struct cmp_seeds {
    bool operator () (const pair< vector<int>::iterator, vector<int>::iterator> p1, const pair< vector<int>::iterator, vector<int>::iterator> p2) const {
        return *(p1.first) >  *(p2.first);      
    }
};

int pq_heapsort_trial() {

    /* Set up the Sorted Vectors */ 
    int a1[] = { 2, 10, 100};
    int a2[] = { 5, 15, 90, 200};
    int a3[] = { 12 };

    vector<int> v1 (a1, a1 + sizeof(a1) / sizeof(int));
    vector<int> v2 (a2, a2 + sizeof(a2) / sizeof(int));
    vector<int> v3 (a3, a3 + sizeof(a3) / sizeof(int));

    vector< vector <int> * > sorted_vectors;
    sorted_vectors.push_back(&v1);
    sorted_vectors.push_back(&v2);
    sorted_vectors.push_back(&v3);
    /* the above simulates the "for" i have in my own code that gives me sorted vectors */

    pair< vector<int>::iterator, vector<int>::iterator> c_lead;
    cmp_seeds mycompare;

    priority_queue< pair< vector<int>::iterator, vector<int>::iterator>, vector<pair< vector<int>::iterator, vector<int>::iterator> >, cmp_seeds> cluster_feeder(mycompare);


    for (vector<vector <int> *>::iterator k = sorted_vectors.begin(); k != sorted_vectors.end(); ++k) {
        cluster_feeder.push( make_pair( (*k)->begin(), (*k)->end() ));
    }


    while ( cluster_feeder.empty() != true) {
        c_lead = cluster_feeder.top();
        cluster_feeder.pop();
        // sorted output
        cout << *(c_lead.first) << endl;

        c_lead.first++;
        if (c_lead.first != c_lead.second) {
            cluster_feeder.push(c_lead);
        }
    }

    return 0;
}

400

asked Jan 26 '12 03:01

Deniz

1 Answers

One option is to use a std :: priority queue to maintain a heap of iterators, where the iterators bubble up the heap depending on the values they point at.

You could also consider using repeating applications of std :: inplace_merge. This would involve appending all the data together into a big vector and remembering the offsets at which each distinct sorted block begins and ends, and then passing those into inplace_merge. This would probably be faster then the heap solution, although I think fundamentally the complexity is equivalent.

Update: I've implemented the second algorithm I just described. Repeatedly doing a mergesort in place. This code is on ideone.

This works by first concatenating all the sorted lists together into one long list. If there were three source lists, this means there are four 'offsets', which are four points in the full list between which the elements are sorted. The algorithm will then pull off three of these at a time, merging the two corresponding adjacent sorted lists into one sorted list, and then remembering two of those three offsets to be used in the new_offsets.

This repeats in a loop, with pairs of adjacent sorted ranges merged together, until only one sorted range remains.

Ultimately, I think the best algorithm would involve merging the shortest pairs of adjacent ranges together first.

// http://stackoverflow.com/questions/9013485/c-how-to-merge-sorted-vectors-into-a-sorted-vector-pop-the-least-element-fro/9048857#9048857
#include <iostream>
#include <vector>
#include <algorithm>
#include <cassert>
using namespace std;

template<typename T, size_t N>
vector<T> array_to_vector( T(*array)[N] ) { // Yes, this works. By passing in the *address* of
                                            // the array, all the type information, including the
                                            // length of the array, is known at compiler. 
        vector<T> v( *array, &((*array)[N]));
        return v;
}   

void merge_sort_many_vectors() {

    /* Set up the Sorted Vectors */ 
    int a1[] = { 2, 10, 100};
    int a2[] = { 5, 15, 90, 200};
    int a3[] = { 12 };

    vector<int> v1  = array_to_vector(&a1);
    vector<int> v2  = array_to_vector(&a2);
    vector<int> v3  = array_to_vector(&a3);


    vector<int> full_vector;
    vector<size_t> offsets;
    offsets.push_back(0);

    full_vector.insert(full_vector.end(), v1.begin(), v1.end());
    offsets.push_back(full_vector.size());
    full_vector.insert(full_vector.end(), v2.begin(), v2.end());
    offsets.push_back(full_vector.size());
    full_vector.insert(full_vector.end(), v3.begin(), v3.end());
    offsets.push_back(full_vector.size());

    assert(full_vector.size() == v1.size() + v2.size() + v3.size());

    cout << "before:\t";
    for(vector<int>::const_iterator v = full_vector.begin(); v != full_vector.end(); ++v) {
            cout << ", " << *v;
    }       
    cout << endl;
    while(offsets.size()>2) {
            assert(offsets.back() == full_vector.size());
            assert(offsets.front() == 0);
            vector<size_t> new_offsets;
            size_t x = 0;
            while(x+2 < offsets.size()) {
                    // mergesort (offsets[x],offsets[x+1]) and (offsets[x+1],offsets[x+2])
                    inplace_merge(&full_vector.at(offsets.at(x))
                                 ,&full_vector.at(offsets.at(x+1))
                                 ,&(full_vector[offsets.at(x+2)]) // this *might* be at the end
                                 );
                    // now they are sorted, we just put offsets[x] and offsets[x+2] into the new offsets.
                    // offsets[x+1] is not relevant any more
                    new_offsets.push_back(offsets.at(x));
                    new_offsets.push_back(offsets.at(x+2));
                    x += 2;
            }
            // if the number of offsets was odd, there might be a dangling offset
            // which we must remember to include in the new_offsets
            if(x+2==offsets.size()) {
                    new_offsets.push_back(offsets.at(x+1));
            }
            // assert(new_offsets.front() == 0);
            assert(new_offsets.back() == full_vector.size());
            offsets.swap(new_offsets);

    }
    cout << "after: \t";
    for(vector<int>::const_iterator v = full_vector.begin(); v != full_vector.end(); ++v) {
            cout << ", " << *v;
    }
    cout << endl;
}

int main() {
        merge_sort_many_vectors();
}

113

answered Nov 04 '22 21:11

Aaron McDaid

Related questions
                            
                                What is a 'linkonce' section
                            
                                A way to disable conversion operators?
                            
                                Parsing python grammar with boost::spirit - problem
                            
                                Abstract factory with abstract parameters?
                            
                                PostMessage() succeeds but my message processing code never receives the message
                            
                                How to use CssStyleProvider in Gtkmm3
                            
                                Are trigraph substitutions reverted when a raw string is created through concatenation?
                            
                                Catch statements are being completely ignored
                            
                                const-correctness and the safe bool idiom
                            
                                Using C++ with Objective-C, How can I fix "Conflicting declaration 'typedef int BOOL'"?
                            
                                Implementing a std::array-like container with a C++11 initializer_list
                            
                                rvalue references in Visual Studio 2010
                            
                                Which default argument is evaluated first and why?
                            
                                Beginning Code::blocks and UnitTest++
                            
                                Flocking Algorithm crashes on 200+ boids
                            
                                std::set fast and slow, what is going on?
                            
                                Using a typedef'd uint causes error, while "unsigned int" does not...?
                            
                                Autoconf/Automake: How to avoid passing the "check" option to AC_CONFIG_SUBDIRS
                            
                                QT - Main Widget - Stack or Heap?
                            
                                How to obtain blocks with large alignment?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

C++ How to merge sorted vectors into a sorted vector / pop the least element from all of them?

Tags:

c++

sorting

vector

processing-efficiency

mergesort

Deniz

People also ask

1 Answers

Aaron McDaid

Recent Activity

Donate For Us