Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the smallest number that does not occur as a subsequence of the string

Tags:

algorithm

From STRSEQ on SPOJ:

Given a string of digits, find the smallest nonnegative integer that does not occur in the given string as a subsequence.

I tried an approach in which I incrementally find the number from 0. But my approach is O(answer * length of the string). Is there an O(length of string) approach for this problem?

Examples:

Input

  • 9876543210
  • 21698921085321984125
  • 12345678901

Output

  • 11
  • 7
  • 22
like image 826
Alex Avatar asked Oct 29 '13 12:10

Alex


1 Answers

Data structures needed for this algorithm:

  • array of bitsets, one bitset per input string element (and one more empty bitset at the end), one bit per digit (size of each bitset is 10)
  • 10 stacks of positions - for each digit (optional)
  • counter (to count digits in the answer)

Algorithm:

  1. Scan input string backwards, push current position to the stack corresponding to digit at this position, copy bitset from previous position and set the bit corresponding to digit at current position. When all bits in current bitset are set, increment the counter and clear the bitset.
  2. If counter is still zero, get least significant bit in current bitset and use single digit corresponding to this bit to construct the resulting number.
  3. If the only unset bit is "zero", the resulting number is constructed as "10" followed by the result of step 4 (where the initial digit "0" is pre-determined).
  4. Get least significant zero bit in current bitset (but when this step is executed for the first time, ignore the bit corresponding to "0"), use it as next digit of the result, pop the stack corresponding to this digit (until you get the position not less than current position), and set current position to the index from this stack plus one. Get the bitset corresponding to current position and repeat step 4. This step should be executed "counter+1" times or until the attempt to pop empty stack.

Alternative algorithm:

Other alternative is to move most calculations from step 4 to step 1. In this case instead of the bitset array and 10 stacks we need even simpler data structures.

Explanation and example:

Step 1 of this algorithm determines shortest suffix that contains any single-digit number as a subsequence. Then it finds shortest substring adjacent to this suffix that also contains any single-digit number. Together they give shortest suffix that contains any two-digit number. This process is continued for 3-digit, 4-digit suffixes, etc. Also this pre-processing step determines which digits could be written to the left of these n-digit numbers so that corresponding sub-sequence exists in some suffix of input string (this information is contained in bitsets).

For example, for input string 12345678901 this step determines that suffix starting at index "1" contains any possible single-digit number. Also it fills bitsets in following way:

index: bitset:
10     0000000010
9      0000000011
8      1000000011
7      1100000011
6      1110000011
5      1111000011
4      1111100011
3      1111110011
2      1111111011
1      0000000000
0      0000000010

Steps 2,3 deal with some corner cases.

Step 4 starts with inspecting the bitset at position "0" to find least possible starting digit. In our example bit "1" is set which means we cannot start our 2-digit number with "1" (all such numbers are already in the string). Also we cannot start it with "0". Next unset bit is "2", so we'll start the number with "2".

We could use corresponding stack or just sequentially search the string to get to the position of the first digit "2". The remaining digit of our number should be in the suffix starting after digit "2". Bitset at the starting position of this suffix is "1111111011", which means the only unused digit in this suffix is "2". We should stop here because "counter+1" is 2 and we just used up 2 iterations of step 4. So the answer is "22".


Here is other algorithm with inferior time complexity O(answer + length of the string).

For each digit (from 0 to 9) prepare an array of indexes, where each index points to nearest digit's occurrence at given position or to the right. For example:

For string     216989210...
and digit "1": 11777777...

This may be implemented by scanning input array backwards, and as soon as we see appropriate digit, start writing its index to index array. For the example above this means we see the rightmost digit "1" at position 7 and fill index array with 7 until we see another occurrence of digit "1" at index 1. Then we fill remaining positions with this index.

Now we can easily reduce the complexity of algorithm mentioned in OP from O(answer * length of the string) to O(answer + length of the string). Just follow the link provided by one of the index arrays to get nearest position of the next digit in current number.

For example, we could try first 61 non-negative numbers, then for the number "61" we inspect the index array for "6" at first position to find index "2" (this is not really necessary since this index is already found while handling "60"), then we inspect the index array for "1" at next position (2+1) to find index "7". Which means "61" occurs in given string and we could continue with "62"...

like image 82
Evgeny Kluev Avatar answered Nov 04 '22 17:11

Evgeny Kluev