Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for merging short lists into a long vector

I have a sparse matrix class whose non-zeros and corresponding column indices are stored, in row-order, in what are basically STL-vector-like containers. They may have unused capacity, like vectors; and to insert/remove elements, existing elements must be moved.

Say I have an operation, insert_erase_replace, or ier for short. ier can do the following, given a position p, a column index j, and a value v:

  • if v==0, ier removes the entry at p and left-shifts all subsequent entries.
  • if v!=0, and j is already present at p, ier replaces the cell contents at p with v.
  • if v!=0, and j is not present at p, ier inserts the entry v and column index j at p after right-shifting all subsequent entries.

So all of that is trivial.

Now let's say I have ier2, which does the same thing, except that it takes a list containing multiple column indices j and corresponding values v. It also has a size n, which indicates how many index/value pairs are present in the list. But because the vector only stores non-zeros, sometimes the actual insertion size is smaller than n.

Still trivial.

But now let's say I have ier3, which takes not just one list like ier2, but multiple lists. This represents editing a slice of the sparse matrix.

At some point, it becomes more efficient to iterate through the vectors, copying them piece by piece and inserting/replacing/erasing the list indices/values ier2-style as we arrive at each insertion point. And if the total insertion size would cause my vector to need a resize anyway, then we do that.

Given that my vector is much, much larger than the total length of the lists, is there an algorithm for efficiently merging the lists into the vector?

So far, here's what I have:

  • Each list passed to ier3 represents either a net deletion of entries (a left shift), a net replacement (no movement, therefore cheap), or a net insertion of entries (a right shift). There may also be some re-arrangement of elements in there, but the expensive parts are the net deletions and net insertions.

  • It's not hard to figure out an algorithm for efficiently doing ONLY net insertions or net deletions.

  • It's harder when either of the two may be happening.

The only thing I can think to do is to handle it in two passes:

  1. Erase/replace
  2. Insert/replace

We erase first because it makes it more likely that any insertions will require fewer copies.

Is this the right approach? Does anyone know of a better one?

like image 513
Translunar Avatar asked Sep 12 '13 00:09

Translunar


1 Answers

Okay, so I'm going to suppose the intervals covered in each list in ier3 are disjoint and given to you in order. If it's meant for editing slices of a matrix, this seems reasonable. I'm also assuming you that you don't need to resize the vector, because that case is easily detectable and solvable.

Initialise a read pointer and a write pointer to the start of the vector you're editing. There'll be an instruction pointer into ie3 too, but I'll ignore that here for clarity's sake. You'll also need a queue. At each step, one of several things can happen:

  • Default: Neither read nor write are at a position detailed by ier3. In this case, add the element under read to the back of the queue and write the element at the front of the queue to the cell under write. Move both pointers forward one.

  • read is over a cell that needs to be deleted. In this case, simply move read forward one without adding anything to the queue.

  • read passes from one cell to the next such that an insertion should happen between them. In this case, add the insertion to the back of the queue and then continue with the next relevant case.

  • read is at a cell that needs to be modified. In this case, insert the modified cell at the back of the queue, write whatever's at the front of the queue to write, and step them both forwards.

  • read has arrived at the unused capacity of the vector. In which case just write whatever's left in the queue.

That's the basic outline, but a couple of optimizations can be made: first, if the queue's empty, step both pointers forward to the next position detailed by ie3 without doing anything. Second, minimize the buffer by doing extra writing steps whenever read is ahead of write and the queue is nonempty.

like image 99
Andy Jones Avatar answered Oct 21 '22 07:10

Andy Jones