I am implementing several datastructures and one primitive I want to use is the following: I have a memory chunk A[N] (it has a variable length, but I take 100 for my examples) and inside this chunk, there is a smaller part C of length K (lets say 30) which I want to move without using any additional memory.
The additional difficulty is, that A "wraps", that is, C can start at A[80] and then the first 20 elements of C are the elements A[80..100] and the last 10 elements are the elements A[0..10]. Furthermore, the target range may also "wrap" and overlap with C in any possible way. Additionally, I don't want to use more than a constant amount of additional memory, everything should happen in place. Also, the part of A which is neither in the target range nor in the source range may contain something important, so it cannot be used either. So one case would be the following:
A looks like this:
|456789ABCDEF0123456789AB|-----|0123|
And should be transformed to this:
|89AB|-----|0123456789ABCDEF01234567|
Just delegating it to a library or use another datastructure from a library is not an option here, I want to understand the problem myself. On the first sight, I thought that it might not be trivial, but as soon as you distinguish a few cases, it becomes clear, but now I am having serious trouble. Of course there are the trivial cases if they don't overlap or don't wrap, but at least if both happens at the same time, it gets messy. You could start with one free place and move the part that belongs there, but then you create another free part somewhere else and it gets hard to keep track of which parts you can stil use.
Maybe I am missing something completely, but even my special case if the target range does not wrap has almost 100 lines (half of it are assertions and comments, though) and I could update it so that it also handles the general case with some additional index calculations, but if someone has an elegant and short solution, I would appreciate some help. Intuitively I think that this should somehow be trivial, but I just don't see the best solution yet.
Note: The interesting case is of course, if C is almost as big as A. If |C| < N/2, it is trivial.
edit: Using more than a constant amount of additional flags/indices counts as additional memory and I want to avoid that if possible.
edit: Some people wanted to see my code. My question is rather abstract, so I didn't want to post it, but maybe someone sees how to improve it. It is terrible, it only works for the case that the target starts at the beginning (however, that can easily be changed) and terribly long, but it does the job without additional memory in O(n).
#include <stddef.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>
void move_part(int* A, size_t N, size_t target, size_t source, size_t size, int show_steps)
{
  assert(source + size <= N);
  assert(target + size <= N);
  if (show_steps) {
    printf("Moving size %d from %d to %d.\n", size, source, target);
  }
  memmove(A + target, A + source, size * sizeof(int));
}
void swap_parts(int* A, size_t N, size_t first_begin, size_t second_begin, size_t size, int show_steps)
{
  if (show_steps) {
    printf("Swapping size %d at %d and %d.\n", size, first_begin, second_begin);
  }
  assert(first_begin + size <= N);
  assert(second_begin + size <= N);
  size_t i;
  for (i = 0; i < size; ++i) {
    int x = A[first_begin + i];
    A[first_begin + i] = A[second_begin + i];
    A[second_begin + i] = x;
  }
}
void move_to_beginning(int* A, size_t N, size_t begin, size_t size, int show_steps)
{
  assert(begin <= N);
  assert(size <= N);
  // Denotes the start of our "working range". Increases during
  // the algorithm and becomes N
  size_t part_start = 0;
  // Note: Keeping the size is crucial since begin == end could
  // mean that the range is empty or full.
  size_t end = (begin + size) % N;
  while (part_start != N) {
    size_t i;
    if (show_steps) {
      for (i = 0; i < N; ++i) {
    printf("%d ", A[i]);
      }
      printf("\n");
      printf("part_start %d  begin %d  end %d  size %d\n", part_start, begin, end, size);
    }
    // loop invariants
    assert(part_start < N);
    // The two pointers are in our range
    assert(part_start <= begin && begin <= N);
    assert(part_start <= end && end <= N);
    // size is valid (wrapped case, non-empty, non-full case)
    assert(begin <= end || (N - begin) + (end - part_start) == size);
    // size is valid (non wrapped case, non-empty, non-full case)
    assert(begin >= end || end - begin == size);
    // size is valid (working range is full or empty case)
    assert(begin != end || size == 0 || part_start + size == N);
    if (size == 0 || begin == N || begin == part_start) {
      // ##|1234|# -> 1234### ||
      if (show_steps) {
    printf("Case 1:\nTerminating\n");
      }
      // #||# -> ## ||
      // 12|##| -> 12## ||
      // |12|## -> 12## ||
      break;
      /* Not necessary any more, but would be the correct transformation:
     part_start = N;
     begin = N;
     end = N;
     size = 0;*/
    } else if (end == part_start) {
      // |##|123 -> ##|123|
      if (show_steps) {
    printf("Case 2:\n");
    printf("Setting end to %d.\n", N);
      }
      end = N;
    } else if (begin < end) {
      // ##|1234|# -> 1234### ||
      if (show_steps) {
    printf("Case 3:\n");
      }
      move_part(A, N, part_start, begin, size, show_steps);
      break;
      /* Not necessary any more, but would be the correct transformation:
     part_start = N;
     begin = N;
     end = N;
     size = 0;*/
    } else {
      size_t end_size = end - part_start;
      size_t begin_size = N - begin;
      assert(begin_size + end_size == size);
      if (end_size >= begin_size) {
    // 345|#|12 -> 12 5|#|34
    if (show_steps) {
      printf("Case 4:\n");
    }
    swap_parts(A, N, part_start, begin, begin_size, show_steps);
    assert(begin_size > 0); // Necessary for progress
    part_start += begin_size;
    size = end_size;
    // begin, end remain unchanged
      } else if (begin - part_start <= begin_size) {
    // 56|#|1234 -> 123 56|#|4
    size_t size_moved = begin - part_start;
    assert(size_moved >= end_size); // else the next step would be more efficient
    if (show_steps) {
      printf("Case 5\n");
    }
    swap_parts(A, N, part_start, begin, end_size, show_steps);
    move_part(A, N, end, begin + end_size, begin - end, show_steps);
    assert(end_size + (begin - end) == size_moved);
    size -= size_moved;
    part_start = begin;
    begin += size_moved;
    end += size_moved;
      } else if (end_size <= begin_size) {
    // 45|##|123 -> 123 #|45|# 
    if (show_steps) {
      printf("Case 6\n");
    }
    swap_parts(A, N, part_start, begin, end_size, show_steps);
    move_part(A, N, end, begin + end_size, begin_size - end_size, show_steps);
    part_start += begin_size;
    size = end_size;
    end = begin + end_size;
    // begin remains unchanged
      } else {
    // No case applies, this should never happen
    assert(0);
      }
    }
  }
}
int main()
{
  int N = 20;
  int A[20];
  size_t size = 17;
  size_t begin = 15;
  size_t i;
  for (i = 0; i < size; ++i) {
    A[(begin + i) % N] = i;
  }
  move_to_beginning(A, N, begin, size, 0);
  for (i = 0; i < size; ++i) {
    printf("%d ", A[i]);
  }
  printf("\n");
  return 0;
}
Detailed explanation of this case is given in the first answer by R.. I've nothing to add here.
The easiest approach would be always rotate whole array. This also moves some unneeded elements from destination range, but since in this case K > N/2, this does not make number of operations more then twice as necessary.
To rotate the array, use cycle leader algorithm: take first element of the array (A[0]) and copy it to destination position; previous contents of this position move again to its proper position; continue until some element is moved to the starting position.
Continue applying the cycle leader algorithm for next starting positions: A[1], A[2], ..., A[GCD(N,d) - 1], where d is the distance between source and destination.
After GCD(N,d) steps, all elements are on their proper positions. This works because:
GCD(N,d)).N / GCD(N,d) - because d / GCD(N,d) and N / GCD(N,d) are relatively prime.This algorithm is simple and it moves each element exactly once. It may be made thread-safe (if we skip the write step unless inside the destination range). Other multi-threading-related advantage is that each element may have only two values - value before "move" and value after "move" (no temporary in-between values possible).
But it does not always have optimal performance. If element_size * GCD(N,d) is comparable to cache line size, we might take all GCD(N,d) starting positions and process them together. If this value is too large, we can split starting positions into several contiguous segments to lower space requirements back to O(1).
The problem is when element_size * GCD(N,d) is much smaller than cache line size. In this case we get a lot of cache misses and performance degrades. gusbro's idea to temporarily swap array pieces with some "swap" region (of size d) suggests more efficient algorithm for this case. It may be optimized more if we use "swap" region, that fits in the cache, and copy non-overlapped areas with memcpy.
One more algorithm. It does not overwrite elements that are not in the destination range. And it is cache-friendly. The only disadvantage is: it moves each element exactly twice.
The idea is to move two pointers in opposite directions and swap pointed elements. There is no problem with overlapping regions because overlapping regions are just reversed. After first pass of this algorithm, we have all source elements moved to destination range, but in reversed order. So second pass should reverse destination range:
for (d = dst_start, s = src_end - 1;
     d != dst_end;
     d = (d + 1) % N, s = (s + N - 1) % N)
  swap(s, d);
for (d = dst_start, s = dst_end - 1;
     d != dst_end;
     d = (d + 1) % N, s = (s + N - 1) % N)
  swap(s, d);
This is not a complete answer yet, but I think it may be the right idea.
Start with an element of the source range and consider the destination position it will be mapped to. That position is either inside the source range, or outside it. If it's outside the source range, you can just copy, and you're done with that element. On the other hand, if it maps onto a destination position inside the source range, you can copy it, but you have to save the old value you're overwriting and perform the above process iteratively with this new element of the source.
Essentially, you're operating on the cycles of a permutation.
The problem is keeping track of what you've finished and what remains to be done. It's not immediately apparent if there's a way to do this without O(n) working space.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With