Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ How to merge sorted vectors into a sorted vector / pop the least element from all of them?

I have a collection of about a hundred or so sorted vector<int>'s Although most vectors have a small number of integers in them, some of the vectors contain a large (>10K) of them (thus the vectors don't necessarily have the same size).

What I'd like to do essentially iterate through smallest to largest integer, that are contained in all these sorted vectors.

One way to do it would be to merge all these sorted vectors into a sorted vector & simply iterate. Thus,

Question 1: What is the fastest way to merge sorted vectors into a sorted vector?

I'm sure on the other hand there are faster / clever ways to accomplish this without merging & re-sorting the whole thing -- perhaps popping the smallest integer iteratively from this collection of sorted vectors; without merging them first.. so:

Question 2: What is the fasted / best way to pop the least element from a bunch of sorted vector<int>'s?


Based on replies below, and the comments to the question I've implemented an approach where I make a priority queue of iterators for the sorted vectors. I'm not sure if this is performance-efficient, but it seems to be very memory-efficient. I consider the question still open, since I'm not sure we've established the fastest way yet.

// compare vector pointers by integers pointed
struct cmp_seeds {
    bool operator () (const pair< vector<int>::iterator, vector<int>::iterator> p1, const pair< vector<int>::iterator, vector<int>::iterator> p2) const {
        return *(p1.first) >  *(p2.first);      
    }
};

int pq_heapsort_trial() {

    /* Set up the Sorted Vectors */ 
    int a1[] = { 2, 10, 100};
    int a2[] = { 5, 15, 90, 200};
    int a3[] = { 12 };

    vector<int> v1 (a1, a1 + sizeof(a1) / sizeof(int));
    vector<int> v2 (a2, a2 + sizeof(a2) / sizeof(int));
    vector<int> v3 (a3, a3 + sizeof(a3) / sizeof(int));

    vector< vector <int> * > sorted_vectors;
    sorted_vectors.push_back(&v1);
    sorted_vectors.push_back(&v2);
    sorted_vectors.push_back(&v3);
    /* the above simulates the "for" i have in my own code that gives me sorted vectors */

    pair< vector<int>::iterator, vector<int>::iterator> c_lead;
    cmp_seeds mycompare;

    priority_queue< pair< vector<int>::iterator, vector<int>::iterator>, vector<pair< vector<int>::iterator, vector<int>::iterator> >, cmp_seeds> cluster_feeder(mycompare);


    for (vector<vector <int> *>::iterator k = sorted_vectors.begin(); k != sorted_vectors.end(); ++k) {
        cluster_feeder.push( make_pair( (*k)->begin(), (*k)->end() ));
    }


    while ( cluster_feeder.empty() != true) {
        c_lead = cluster_feeder.top();
        cluster_feeder.pop();
        // sorted output
        cout << *(c_lead.first) << endl;

        c_lead.first++;
        if (c_lead.first != c_lead.second) {
            cluster_feeder.push(c_lead);
        }
    }

    return 0;
}
like image 400
Deniz Avatar asked Jan 26 '12 03:01

Deniz


People also ask

Can we sort vector using sort function?

A vector in C++ can be easily sorted in ascending order using the sort() function defined in the algorithm header file. The sort() function sorts a given data structure and does not return anything.

How do you sort an array of vectors?

Sorting a vector in C++ can be done by using std::sort(). It is defined in<algorithm> header. To get a stable sort std::stable_sort is used. It is exactly like sort() but maintains the relative order of equal elements.

What is merge sort C++?

Last Updated: September 29, 2022. C++ Merge Sort Technique. Merge sort algorithm uses the “divide and conquer” strategy wherein we divide the problem into subproblems and solve those subproblems individually. These subproblems are then combined or merged together to form a unified solution.


1 Answers

One option is to use a std :: priority queue to maintain a heap of iterators, where the iterators bubble up the heap depending on the values they point at.

You could also consider using repeating applications of std :: inplace_merge. This would involve appending all the data together into a big vector and remembering the offsets at which each distinct sorted block begins and ends, and then passing those into inplace_merge. This would probably be faster then the heap solution, although I think fundamentally the complexity is equivalent.

Update: I've implemented the second algorithm I just described. Repeatedly doing a mergesort in place. This code is on ideone.

This works by first concatenating all the sorted lists together into one long list. If there were three source lists, this means there are four 'offsets', which are four points in the full list between which the elements are sorted. The algorithm will then pull off three of these at a time, merging the two corresponding adjacent sorted lists into one sorted list, and then remembering two of those three offsets to be used in the new_offsets.

This repeats in a loop, with pairs of adjacent sorted ranges merged together, until only one sorted range remains.

Ultimately, I think the best algorithm would involve merging the shortest pairs of adjacent ranges together first.

// http://stackoverflow.com/questions/9013485/c-how-to-merge-sorted-vectors-into-a-sorted-vector-pop-the-least-element-fro/9048857#9048857
#include <iostream>
#include <vector>
#include <algorithm>
#include <cassert>
using namespace std;

template<typename T, size_t N>
vector<T> array_to_vector( T(*array)[N] ) { // Yes, this works. By passing in the *address* of
                                            // the array, all the type information, including the
                                            // length of the array, is known at compiler. 
        vector<T> v( *array, &((*array)[N]));
        return v;
}   

void merge_sort_many_vectors() {

    /* Set up the Sorted Vectors */ 
    int a1[] = { 2, 10, 100};
    int a2[] = { 5, 15, 90, 200};
    int a3[] = { 12 };

    vector<int> v1  = array_to_vector(&a1);
    vector<int> v2  = array_to_vector(&a2);
    vector<int> v3  = array_to_vector(&a3);


    vector<int> full_vector;
    vector<size_t> offsets;
    offsets.push_back(0);

    full_vector.insert(full_vector.end(), v1.begin(), v1.end());
    offsets.push_back(full_vector.size());
    full_vector.insert(full_vector.end(), v2.begin(), v2.end());
    offsets.push_back(full_vector.size());
    full_vector.insert(full_vector.end(), v3.begin(), v3.end());
    offsets.push_back(full_vector.size());

    assert(full_vector.size() == v1.size() + v2.size() + v3.size());

    cout << "before:\t";
    for(vector<int>::const_iterator v = full_vector.begin(); v != full_vector.end(); ++v) {
            cout << ", " << *v;
    }       
    cout << endl;
    while(offsets.size()>2) {
            assert(offsets.back() == full_vector.size());
            assert(offsets.front() == 0);
            vector<size_t> new_offsets;
            size_t x = 0;
            while(x+2 < offsets.size()) {
                    // mergesort (offsets[x],offsets[x+1]) and (offsets[x+1],offsets[x+2])
                    inplace_merge(&full_vector.at(offsets.at(x))
                                 ,&full_vector.at(offsets.at(x+1))
                                 ,&(full_vector[offsets.at(x+2)]) // this *might* be at the end
                                 );
                    // now they are sorted, we just put offsets[x] and offsets[x+2] into the new offsets.
                    // offsets[x+1] is not relevant any more
                    new_offsets.push_back(offsets.at(x));
                    new_offsets.push_back(offsets.at(x+2));
                    x += 2;
            }
            // if the number of offsets was odd, there might be a dangling offset
            // which we must remember to include in the new_offsets
            if(x+2==offsets.size()) {
                    new_offsets.push_back(offsets.at(x+1));
            }
            // assert(new_offsets.front() == 0);
            assert(new_offsets.back() == full_vector.size());
            offsets.swap(new_offsets);

    }
    cout << "after: \t";
    for(vector<int>::const_iterator v = full_vector.begin(); v != full_vector.end(); ++v) {
            cout << ", " << *v;
    }
    cout << endl;
}

int main() {
        merge_sort_many_vectors();
}
like image 113
Aaron McDaid Avatar answered Nov 04 '22 21:11

Aaron McDaid