Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to interleave array of characters and digits in-place

There's an array like [a,b,2,c,3,d,4,1]. It has to be modified into an array like [a,2,b,3,c,4,d,1] in-place.

That is, from the original array with letters and digits interspersed, the modified version should contain the same elements such that the array has alternating letters and digits, while preserving their original relative order.

It can be done easily, by using two pointers, and outputting the modified version into a new array, but it has to be done in-place in O(N) time and O(1) space. Is it possible at all, and if so, how?

like image 777
SexyBeast Avatar asked Oct 06 '12 19:10

SexyBeast


1 Answers

O(N) in-place algorithm is possible. You can start with sorting all letters to the beginning of the array, all digits - to the end. Then in-place transpose a 2x(N/2) matrix to place all elements from the first half of the array to even positions, other elements - to odd positions. This question explains how to do this transposition (actually, algorithm, described there, should be applied in-reverse).

The most tricky part is sorting. It should be both stable and in-place.

It is trivial to sort in O(N2) time. Insertion sort or bubble sort will do it.

O(N log N) time is more complicated. Use stable in-place merge sort, described in this answer. This answer gives a pair of links, describing the ideas, that may be used to compose an O(N) algorithm, which is simpler than stable in-place merge sort, but still pretty complicated, so I give only a sketch of it.


Divide the array into two areas. One of them will be used to temporarily store some data, needed for other parts of the algorithm (since this area should still store array elements, any additional information is encoded by order of these elements). Other area is interpreted as a square matrix, where elements should be sorted, first rows, then columns. After both areas are sorted, transpose algorithm is applied to each of them to get characters and digits in proper place.


1. Making temporary storage out of part of the array

Most elements of the array are organized as a square matrix of size K. K is the largest even number, such that 2 * 8 * K + K2 is not greater than N. Size of temporary storage M = N - K2, which is approximately equal to 2*8*sqrt(N).

To prepare M elements for temporary storage, scan first M elements of the array and determine, what kind of elements is least represented in this range. For example, let it be "digits". Pack digits into a group of size M/2. To do this, iteratively exchange blocks of letters and digits as shown in this example:

ddaaaddddadddaaaaad
aaaDDddddadddaaaaad
aaaaDDDDDDdddaaaaad
aaaaaaaaaDDDDDDDDDd

Stop when at least M/2 digits are collected in single group. Block exchange procedure (which may be also theated as in-place sub-array rotation) may be implemented either as the first algorithm of this answer or as following:

  1. Reverse block of digits
  2. Reverse block of numbers
  3. Reverse both blocks together

Example:

123abcd
321abcd
321dcba
abcd123

Packing digits into a group of size M/2 requires to move up to (M/2)2 digits and up to N/2 letters, so this can be done in O(N) time.

To be exact, it is O(N log2U) time, which is OK to sort ASCII characters, but too large if we need to sort some other kind of values instead (U is the size of value set). To improve run time to O(N * log U * log log U), start with a smaller temporary storage of size 2 * K, use it to sort log U ranges, then pair-wise merge these ranges with block exchange procedure in log log U steps to get temporary storage of proper size.

At this point we have M/2-sized block of digits, preceded by a block of letters (probably of larger size). Use block exchange procedure to make two blocks of equal size M/2.

To store bits of data in temporary storage, we may exchange elements at positions i and i + M/2. If position i stores a letter and position i + M/2 stores a digit, we have zero bit. If digit comes first, we have non-zero bit. 8 consecutive bits encode a single character. Which means, we can remember up to K characters in temporary storage.

Other alternative to storing bits in temporary storage is its "number" half. Each number may be stored either as '0' .. '9' (which means bit 0), or as 'a' .. 'j' (which means bit 1). If number of possible letters is at least 3 times as large as number of digits, we can store 2 bits in each value. To squeeze 4 bits out of each value, we may pack every 2 digits into a single byte; this gives M/4 free bytes; so M may be decreased to 4 * K.

enter image description here

2. Sorting rows of the matrix

From this point we should rearrange main area of the array, using temporary storage as additional memory. Determine, what kind of elements is least represented in sub-array of size 2 * K. For example, let it be "digits". Sequentially scan the array, moving digits to temporary storage and packing letters into continuous sequence. When K digits are collected, we have L letters in sequence and L >= K. Move last L%K letters to the end of unoccupied area and fill rest of unoccupied area with digits from temporary storage. Then continue sequentially scanning the array.

This procedure copies each element of the array only once or twice, so it needs O(N) time to complete. At the end all blocks of letters/digits are aligned so that each row of the matrix contains elements of one kind.

enter image description here


3. Sorting columns of the matrix

This is a simplified version of previous procedure. For each matrix column, move digits to temporary storage and pack letters into continuous rows, then copy digits from temporary storage to unoccupied rows.

This procedure also needs O(N) time to complete. At the end main area of the array is completely sorted. Now sort temporary storage (write zero to each of its "bits").

enter image description here

4. Transposing arrays

The only remaining step is to transpose both temporary storage and main area of the array so that characters and digits are in proper place. (Note that not a K x K matrix, but two 2 x J matrices, made from both sub-arrays are transposed with algorithm, mentioned in one of the linked answers).

This procedure as well as the whole algorithm needs O(N) time.

enter image description here


Second algorithm.

Since ASCII contains only 10 digits and 52 letters, we can significantly simplify algorithm in the following way:

  1. Pack last 1/3 of values using BASE64 encoding, which gives N/12 bytes of free space (to be used as temporary storage).
  2. Sort first 1/3 of values using this space as temporary storage: determine, what kind of elements is least represented in sub-array of size N/6; for example, let it be "digits"; sequentially scan the array, moving digits to temporary storage and packing letters into continuous sequence; then copy temporary storage to unoccupied space; repeat this for the next N/6 elements.
  3. Unpack last 1/3 of values, pack first 1/3 of values.
  4. Sort last 2/3 of values (sort N/6 elements 4 times).
  5. Unpack first 1/3 of values.
  6. At this point we have 12 groups of characters (of variable size). Use block exchange procedure to move pieces of these groups between them and make 12 groups of equal size in proper order (letters, digits, ...).
  7. Transpose 6 pairs of characters' groups. Actually, a simple transposition algorithm is possible here. Apply cycle-leader algorithm to all unmarked elements (bit 7 is zero), and while element is moved, set its bit 7 to one. When done, clear all marks.

Third algorithm.

Other approach for solving this problem is to sort all letters to the beginning of the array, all digits - to the end with (some modification of) in-place stable radix sort, described in this pdf. After sorting, apply transpose algorithm, mentioned earlier.

The trickiest thing here is the compression part of this radix sort algorithm. We cannot save one bit out of each value. Instead, we should use arithmetic coding.

like image 54
Evgeny Kluev Avatar answered Sep 23 '22 02:09

Evgeny Kluev