Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to calculate lexicographic index

Can anybody find any potentially more efficient algorithms for accomplishing the following task?:

For any given permutation of the integers 0 thru 7, return the index which describes the permutation lexicographically (indexed from 0, not 1).

For example,

  • The array 0 1 2 3 4 5 6 7 should return an index of 0.
  • The array 0 1 2 3 4 5 7 6 should return an index of 1.
  • The array 0 1 2 3 4 6 5 7 should return an index of 2.
  • The array 1 0 2 3 4 5 6 7 should return an index of 5039 (that's 7!-1 or factorial(7)-1).
  • The array 7 6 5 4 3 2 1 0 should return an index of 40319 (that's 8!-1). This is the maximum possible return value.

My current code looks like this:

int lexic_ix(int* A){
    int value = 0;
    for(int i=0 ; i<7 ; i++){
        int x = A[i];
        for(int j=0 ; j<i ; j++)
            if(A[j]<A[i]) x--;
        value += x*factorial(7-i);  // actual unrolled version doesn't have a function call
    }
    return value;
}

I'm wondering if there's any way I can reduce the number of operations by removing that inner loop, or if I can reduce conditional branching in any way (other than unrolling - my current code is actually an unrolled version of the above), or if there are any clever bitwise hacks or filthy C tricks to help.

I already tried replacing

if(A[j]<A[i]) x--;

with

x -= (A[j]<A[i]);

and I also tried

x = A[j]<A[i] ? x-1 : x;

Both replacements actually led to worse performance.

And before anyone says it - YES this is a huge performance bottleneck: currently about 61% of the program's runtime is spent in this function, and NO, I don't want to have a table of precomputed values.

Aside from those, any suggestions are welcome.

like image 994
A Frayed Knot Avatar asked May 31 '14 01:05

A Frayed Knot


People also ask

How do you calculate lexicographic order?

Approach: Find a string which is lexicographically greater than string S and check if it is smaller than string T, if yes print the string next else print “-1”. To find string, iterate the string S in the reverse order, if the last letter is not 'z', increase the letter by one (to move to next letter).

How do you get the smallest in lexicographical order?

Explanation: Inserting character e in the last index gives the lexicographically smallest string.

What is the lexicographic approach?

a model used in the study of consumer decision processes to evaluate alternatives; the idea that if two products are equal on the most important attribute, the consumer moves to the next most important, and, if still equal, to the next most important, etc.

How do you find the lexicographic order of a string in Python?

Lexicographical order In Python is achieved by using sort() and sorted() function. sort() function sorts data elements in lexicographical order by performing operations on the input list. sorted() function sorts data elements in lexicographical order by replicating the input list and keeping the input list as it is.


1 Answers

Don't know if this helps but here's an other solution :

int lexic_ix(int* A, int n){ //n = last index = number of digits - 1
    int value = 0;
    int x = 0;
    for(int i=0 ; i<n ; i++){
        int diff = (A[i] - x); //pb1
        if(diff > 0)
        {
            for(int j=0 ; j<i ; j++)//pb2
            {
                if(A[j]<A[i] && A[j] > x)
                {
                    if(A[j]==x+1)
                    {
                      x++;
                    }
                    diff--;
                }
            }
            value += diff;
        }
        else
        {
          x++;
        }
        value *= n - i;
    }
    return value;
}

I couldn't get rid of the inner loop, so complexity is o(n log(n)) in worst case, but o(n) in best case, versus your solution which is o(n log(n)) in all cases.

Alternatively, you can replace the inner loop by the following to remove some worst cases at the expense of another verification in the inner loop :

int j=0;
while(diff>1 && j<i)
{
  if(A[j]<A[i])
  {
    if(A[j]==x+1)
    {
      x++;
    }
    diff--;
  }
  j++;
}

Explanation :

(or rather "How I ended with that code", I think it is not that different from yours but it can make you have ideas, maybe) (for less confusion I used characters instead and digit and only four characters)

abcd 0  = ((0 * 3 + 0) * 2 + 0) * 1 + 0
abdc 1  = ((0 * 3 + 0) * 2 + 1) * 1 + 0
acbd 2  = ((0 * 3 + 1) * 2 + 0) * 1 + 0
acdb 3  = ((0 * 3 + 1) * 2 + 1) * 1 + 0
adbc 4  = ((0 * 3 + 2) * 2 + 0) * 1 + 0
adcb 5  = ((0 * 3 + 2) * 2 + 1) * 1 + 0 //pb1
bacd 6  = ((1 * 3 + 0) * 2 + 0) * 1 + 0
badc 7  = ((1 * 3 + 0) * 2 + 1) * 1 + 0
bcad 8  = ((1 * 3 + 1) * 2 + 0) * 1 + 0 //First reflexion
bcda 9  = ((1 * 3 + 1) * 2 + 1) * 1 + 0
bdac 10 = ((1 * 3 + 2) * 2 + 0) * 1 + 0
bdca 11 = ((1 * 3 + 2) * 2 + 1) * 1 + 0
cabd 12 = ((2 * 3 + 0) * 2 + 0) * 1 + 0
cadb 13 = ((2 * 3 + 0) * 2 + 1) * 1 + 0
cbad 14 = ((2 * 3 + 1) * 2 + 0) * 1 + 0
cbda 15 = ((2 * 3 + 1) * 2 + 1) * 1 + 0 //pb2
cdab 16 = ((2 * 3 + 2) * 2 + 0) * 1 + 0
cdba 17 = ((2 * 3 + 2) * 2 + 1) * 1 + 0
[...]
dcba 23 = ((3 * 3 + 2) * 2 + 1) * 1 + 0

First "reflexion" :

An entropy point of view. abcd have the fewest "entropy". If a character is in a place it "shouldn't" be, it creates entropy, and the earlier the entropy is the greatest it becomes.

For bcad for example, lexicographic index is 8 = ((1 * 3 + 1) * 2 + 0) * 1 + 0 and can be calculated that way :

value = 0;
value += max(b - a, 0); // = 1; (a "should be" in the first place [to create the less possible entropy] but instead it is b)
value *= 3 - 0; //last index - current index
value += max(c - b, 0); // = 1; (b "should be" in the second place but instead it is c)
value *= 3 - 1;
value += max(a - c, 0); // = 0; (a "should have been" put earlier, so it does not create entropy to put it there)
value *= 3 - 2;
value += max(d - d, 0); // = 0;

Note that the last operation will always do nothing, that's why "i

First problem (pb1) :

For adcb, for example, the first logic doesn't work (it leads to an lexicographic index of ((0* 3+ 2) * 2+ 0) * 1 = 4) because c-d = 0 but it creates entropy to put c before b. I added x because of that, it represents the first digit/character that isn't placed yet. With x, diff cannot be negative. For adcb, lexicographic index is 5 = ((0 * 3 + 2) * 2 + 1) * 1 + 0 and can be calculated that way :

value = 0; x=0;
diff = a - a; // = 0; (a is in the right place)
diff == 0 => x++; //x=b now and we don't modify value
value *= 3 - 0; //last index - current index
diff = d - b; // = 2; (b "should be" there (it's x) but instead it is d)
diff > 0 => value += diff; //we add diff to value and we don't modify x
diff = c - b; // = 1; (b "should be" there but instead it is c) This is where it differs from the first reflexion
diff > 0 => value += diff;
value *= 3 - 2;

Second problem (pb2) :

For cbda, for example, lexicographic index is 15 = ((2 * 3 + 1) * 2 + 1) * 1 + 0, but the first reflexion gives : ((2 * 3 + 0) * 2 + 1) * 1 + 0 = 13 and the solution to pb1 gives ((2 * 3 + 1) * 2 + 3) * 1 + 0 = 17. The solution to pb1 doesn't work because the two last characters to place are d and a, so d - a "means" 1 instead of 3. I had to count the characters placed before that comes before the character in place, but after x, so I had to add an inner loop.

Putting it all together :

I then realised that pb1 was just a particular case of pb2, and that if you remove x, and you simply take diff = A[i], we end up with the unnested version of your solution (with factorial calculated little by little, and my diff corresponding to your x).

So, basically, my "contribution" (I think) is to add a variable, x, which can avoid doing the inner loop when diff equals 0 or 1, at the expense of checking if you have to increment x and doing it if so.

I also checked if you have to increment x in the inner loop (if(A[j]==x+1)) because if you take for example badce, x will be b at the end because a comes after b, and you will enter the inner loop one more time, encountering c. If you check x in the inner loop, when you encounter d you have no choice but doing the inner loop, but x will update to c, and when you encounter c you will not enter the inner loop. You can remove this check without breaking the program

With the alternative version and the check in the inner loop it makes 4 different versions. The alternative one with the check is the one in which you enter the less the inner loop, so in terms of "theoretical complexity" it is the best, but in terms of performance/number of operations, I don't know.

Hope all of this helps (since the question is rather old, and I didn't read all the answers in details). If not, I still had fun doing it. Sorry for the long post. Also I'm new on Stack Overflow (as a member), and not a native speaker, so please be nice, and don't hesitate to let me know if I did something wrong.

like image 54
Seipas Avatar answered Sep 29 '22 17:09

Seipas