Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any array-like data structure that can grow in size on both sides?

I'm a student working on a small project for an high performance computing course, hence efficiency it's a key issue.

Let say that I have a vector of N floats and I want to remove the smallest n elements and the biggest n elements. There are two simple ways of doing this:

A

sort in ascending order    // O(NlogN)
remove the last n elements // O(1)
invert elements order      // O(N)
remove the last n elements // O(1)

B

sort in ascending order     // O(NlogN)
remove the last n elements  // O(1)
remove the first n elements // O(N)

In A inverting the elements order require swapping all the elements, while in B removing the first n elements require moving all the others to occupy the positions left empty. Using std::remove would give the same problem.

If I could remove the first n elements for free then solution B would be cheaper. That should be easy to achieve, if instead of having a vector, i.e. an array with some empty space after vector::end(), I would have a container with some free space also before vector::begin().

So the question is: does exist already an array-like (i.e. contiguous memory, no linked lists) in some libraries (STL, Boost) that allows for O(1) inserting/removing on both sides of the array?

If not, do you think that there are better solutions than creating such a data structure?

like image 277
Fabio Avatar asked May 23 '14 08:05

Fabio


3 Answers

Have you thought of using std::partition with a custom functor like the example below:

#include <iostream>
#include <vector>
#include <algorithm>

template<typename T>
class greaterLess {
    T low;
    T up;
  public:
    greaterLess(T const &l, T const &u) : low(l), up(u) {}
    bool operator()(T const &e) { return !(e < low || e > up); }
};

int main()
{
    std::vector<double> v{2.0, 1.2, 3.2, 0.3, 5.9, 6.0, 4.3};
    auto it = std::partition(v.begin(), v.end(), greaterLess<double>(2.0, 5.0));
    v.erase(it, v.end());

    for(auto i : v) std::cout << i << " ";
    std::cout << std::endl;

    return 0;
}

This way you would erase elements from your vector in O(N) time.

like image 57
101010 Avatar answered Sep 22 '22 06:09

101010


Try boost::circular_buffer:

It supports random access iterators, constant time insert and erase operations at the beginning or the end of the buffer and interoperability with std algorithms.

Having looked at the source, it seems (and is only logical) that data is kept as a continuous memory block.

The one caveat is that the buffer has fixed capacity and after exhausting it elements will get overwritten. You can either detect such cases yourself and resize the buffer manually, or use boost::circular_buffer_space_optimized with a humongous declared capacity, since it won't allocate it if not needed.

like image 37
gwiazdorrr Avatar answered Sep 22 '22 06:09

gwiazdorrr


To shrink & grow a vector at both ends, you can use idea of slices, reserving extra memory to expand into ahead of time at front and back, if efficient growth is needed.

Simply, make a class with not only a length but indices for first & last elements and a suitably sized vector, to create a window of data on the underlying block of stored floats. A C++ class can provide inlined functions, for things like deleting items, address into the array, find the nth largest value, shift the slice values down or up to insert new elements maintaining sorted order. Should no spare elements be available, then dynamic allocation of a new larger float store, permits continuing growth at the cost of an array copy.

A circular buffer is designed as a FIFO, with new elements added at end, removal at front, and not allowing insertion in the middle, a self defined class can also (trivially) support array subscript values different from 0..N-1

Due to memory locality, avoiding excessive indirection due to pointer chains, and the pipelining of subscript calculations on a modern processor, a solution based on an array (or a vector), is likely to be most efficicent, despite element copying on insertion. Deque would be suitable but it fails to guarantee contiguous storage.

Additional supplementary info. Researching classes providing slices, finds some plausible alternatives to evaluate :

A) std::slice which uses slice_arrays B) Boost Class Range

Hope this is the kind of specific information you were hoping for, in general a simpler clearer solution is more maintainable, than a tricky one. I would expect slices and ranges on sorted data sets, being quite common, for example filtering experimental data where "outliers" are excluded as faulty readings.

I think a good solution, should actually be - O(NlogN), 2xO(1), with any binary searches O(logN +1) for filtering on outlying values, in place of deleting a fixed number of small or large values; it matters that the "O" is relatively fast to, sometimes an O(1) algorithmn can be in practice slower for practical values of N than an O(N) one.

like image 27
Rob11311 Avatar answered Sep 22 '22 06:09

Rob11311