Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to tell if an array is a permutation in O(n)?

Input: A read-only array of N elements containing integer values from 1 to N (some integer values can appear more than once!). And a memory zone of a fixed size (10, 100, 1000 etc - not depending on N).

How to tell in O(n) if the array represents a permutation?

--What I achieved so far (an answer proved that this was not good):--

  1. I use the limited memory area to store the sum and the product of the array.
  2. I compare the sum with N*(N+1)/2 and the product with N!

I know that if condition (2) is true I might have a permutation. I'm wondering if there's a way to prove that condition (2) is sufficient to tell if I have a permutation. So far I haven't figured this out ...

like image 361
INS Avatar asked May 20 '10 10:05

INS


People also ask

How can you tell if two arrays are permutations?

If Sum1 == Sum2 and Mul1 == Mul2, then both arrays are permutations of each other, else not.

What is 0 based permutation?

A zero-based permutation nums is an array of distinct integers from 0 to nums. length - 1 (inclusive). Complete the function: vector buildArray(vector& nums) Test the function using the sample case given in main function, Input: nums [5,0,1,2,3,4] %3!

What is permutation of n numbers?

What is a permutation of n numbers? A permutation of n numbers is an array containing n numbers from 1 to n such that each number occurs only once.


2 Answers

I'm very slightly skeptical that there is a solution. Your problem seems to be very close to one posed several years ago in the mathematical literature, with a summary given here ("The Duplicate Detection Problem", S. Kamal Abdali, 2003) that uses cycle-detection -- the idea being the following:

If there is a duplicate, there exists a number j between 1 and N such that the following would lead to an infinite loop:

x := j; do {    x := a[x]; } while (x != j); 

because a permutation consists of one or more subsets S of distinct elements s0, s1, ... sk-1 where sj = a[sj-1] for all j between 1 and k-1, and s0 = a[sk-1], so all elements are involved in cycles -- one of the duplicates would not be part of such a subset.

e.g. if the array = [2, 1, 4, 6, 8, 7, 9, 3, 8]

then the element in bold at position 5 is a duplicate because all the other elements form cycles: { 2 -> 1, 4 -> 6 -> 7 -> 9 -> 8 -> 3}. Whereas the arrays [2, 1, 4, 6, 5, 7, 9, 3, 8] and [2, 1, 4, 6, 3, 7, 9, 5, 8] are valid permutations (with cycles { 2 -> 1, 4 -> 6 -> 7 -> 9 -> 8 -> 3, 5 } and { 2 -> 1, 4 -> 6 -> 7 -> 9 -> 8 -> 5 -> 3 } respectively).

Abdali goes into a way of finding duplicates. Basically the following algorithm (using Floyd's cycle-finding algorithm) works if you happen across one of the duplicates in question:

function is_duplicate(a, N, j) {      /* assume we've already scanned the array to make sure all elements         are integers between 1 and N */      x1 := j;      x2 := j;      do      {                       x1 := a[x1];          x2 := a[x2];          x2 := a[x2];      } while (x1 != x2);       /* stops when it finds a cycle; x2 has gone around it twice,          x1 has gone around it once.         If j is part of that cycle, both will be equal to j. */      return (x1 != j); } 

The difficulty is I'm not sure your problem as stated matches the one in his paper, and I'm also not sure if the method he describes runs in O(N) or uses a fixed amount of space. A potential counterexample is the following array:

[3, 4, 5, 6, 7, 8, 9, 10, ... N-10, N-9, N-8, N-7, N-2, N-5, N-5, N-3, N-5, N-1, N, 1, 2]

which is basically the identity permutation shifted by 2, with the elements [N-6, N-4, and N-2] replaced by [N-2, N-5, N-5]. This has the correct sum (not the correct product, but I reject taking the product as a possible detection method since the space requirements for computing N! with arbitrary precision arithmetic are O(N) which violates the spirit of the "fixed memory space" requirement), and if you try to find cycles, you will get cycles { 3 -> 5 -> 7 -> 9 -> ... N-7 -> N-5 -> N-1 } and { 4 -> 6 -> 8 -> ... N-10 -> N-8 -> N-2 -> N -> 2}. The problem is that there could be up to N cycles, (identity permutation has N cycles) each taking up to O(N) to find a duplicate, and you have to keep track somehow of which cycles have been traced and which have not. I'm skeptical that it is possible to do this in a fixed amount of space. But maybe it is.

This is a heavy enough problem that it's worth asking on mathoverflow.net (despite the fact that most of the time mathoverflow.net is cited on stackoverflow it's for problems which are too easy)


edit: I did ask on mathoverflow, there's some interesting discussion there.

like image 145
Jason S Avatar answered Sep 22 '22 02:09

Jason S


This is impossible to do in O(1) space, at least with a single-scan algorithm.

Proof

Suppose you have processed N/2 of the N elements. Assuming the sequence is a permutation then, given the state of the algorithm, you should be able to figure out the set of N/2 remaining elements. If you can't figure out the remaining elements, then the algorithm can be fooled by repeating some of the old elements.

There are N choose N/2 possible remaining sets. Each of them must be represented by a distinct internal state of the algorithm, because otherwise you couldn't figure out the remaining elements. However, it takes logarithmic space to store X states, so it takes BigTheta(log(N choose N/2)) space to store N choose N/2 states. That values grows with N, and therefore the algorithm's internal state can not fit in O(1) space.

More Formal Proof

You want to create a program P which, given the final N/2 elements and the internal state of the linear-time-constant-space algorithm after it has processed N/2 elements, determines if the entire sequence is a permutation of 1..N. There is no time or space bound on this secondary program.

Assuming P exists we can create a program Q, taking only the internal state of the linear-time-constant-space algorithm, which determines the necessary final N/2 elements of the sequence (if it was a permutation). Q works by passing P every possible final N/2 elements and returning the set for which P returns true.

However, because Q has N choose N/2 possible outputs, it must have at least N choose N/2 possible inputs. That means the internal state of the original algorithm must store at least N choose N/2 states, requiring BigTheta(log N choose N/2), which is greater than constant size.

Therefore the original algorithm, which does have time and space bounds, also can't work correctly if it has constant-size internal state.

[I think this idea can be generalized, but thinking isn't proving.]

Consequences

BigTheta(log(N choose N/2)) is equal to BigTheta(N). Therefore just using a boolean array and ticking values as you encounter them is (probably) space-optimal, and time-optimal too since it takes linear time.

like image 26
Craig Gidney Avatar answered Sep 23 '22 02:09

Craig Gidney