Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest implementation of log2(int) and log2(float)

The question is

Are there any other (and/or faster) implementations of a basic 2log?

Applications

The log2(int) and log2(float) operations are very useful in a lot of different contexts. To name a few: compression algorithms, 3d engines and machine learning. In almost all of these contexts they are used in the low-level code that is called billions of times... Especially the log2(int) operation is very useful.

Because I find myself using log2 all the time, I don't want to give a specific application I'm working on here. What is the same is the fact that this is a real performance drainer (as shown by performance tests of various applications). For me it's key to get this as fast as possible.

The complete source code to test all implementations is added at the bottom, so you can see for yourself.

And of course... run your tests at least 3 times and make sure the counters are big enough to hit multiple seconds. Also I do the 'add' operation to ensure the whole loop isn't magically removed by the JIT'ter. So let's get started with the real work.

Trivial implementation

The trivial implementation of a 2log in C# is:

(int)(Math.Log(x) / Math.Log(2))

This implementation is trivial, but also very slow. It requires 2 Log operations, that are in itself quite slow already. Of course, we can optimize this by making 1.0/Math.Log(2) a constant.

Note that we need to modify this constant a bit to get the right results (as a result of floating point errors) or add a small number to get the correct results. I chose the latter, but it doesn't really matter - the end result is slow in all cases.

Table lookup

A faster solution for this is to use a lookup table. While you can use a lookup table of any power of 2, I usually use a table size of 256 or 64K entries.

First we create the lookup table:

lookup = new int[256];
for (int i = 1; i < 256; ++i)
{
    lookup[i] = (int)(Math.Log(i) / Math.Log(2));
}

Next, we implement the 2log as follows:

private static int LogLookup(int i)
{
    if (i >= 0x1000000) { return lookup[i >> 24] + 24; }
    else if (i >= 0x10000) { return lookup[i >> 16] + 16; }
    else if (i >= 0x100) { return lookup[i >> 8] + 8; }
    else { return lookup[i]; }
}

As you can see, table lookups are a much, much faster implementation - but as a con it cannot be used to calculate log2(float).

Branch removal

As we all know, processors aren't very good at branching, so I figured that table lookups can be improved by removing the branches. Instead of the bunches of if's I introduced a second table with the values and shift bits around to find the entry in the table:

nobranch = new int[16] { 0, 0, 8, 8, 16, 16, 16, 16, 24, 24, 24, 24, 24, 24, 24, 24 };

private static int LogDoubleLookup(int i)
{
    int n = (i | (i >> 4));
    n = (n | (n >> 2));
    n = (n | (n >> 1));
    n = ((n & 0x1000000) >> 21) | ((n & 0x10000) >> 14) | ((n & 0x100) >> 7) | (n & 1);
    int br = nobranch[n];
    return lookup[i >> br] + br;
}

If you run this test, you will find that it is actually slower than the if-then-else solution.

And then there was the Intel 80386

Intel understood years ago that this is an important operation, so they implemented Bit-Scan-Forward (BSF) into their processors. Other processors have similar instructions. This is by far the fastest way to do a 2log that I know of - but unfortunately I know of now way to use these nice functions from C#... I don't like the idea of having an implementation that doesn't run anymore when a new tablet or phone hits the market - and I don't know of any cross-platform solution that enables me to use this function directly.

Other implementations

As l4V pointed out (thanks!) there are a couple of other implementations, specifically:

  • Trivial loop. I omitted this because it's trivial this isn't really fast. Implemented in TestTrivial.
  • 64-bit IEEE / int union's that can be used. Implemented in TestFloat
  • DeBruijn lookup tables. Implemented in TestDeBruijn
  • Binary search. Implemented in TestBinary

Apart that I like the name, the DeBruijn lookup tables are just as fast as the normal lookup tables, making it one of the fastest algorithms here... all the other algorithms I've tried are much slower.

Complete test code

public class Log2Test
{
    public static void TestNaive()
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += (int)(Math.Log(i) / Math.Log(2.0));
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - naive implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    public static int LogTrivialLoop(int v)
    {
        int r = 0;
        while ((v >>= 1) > 0) // unroll for more speed...
        {
            r++;
        }
        return r;
    }

    public static void TestTrivialLoop()
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogTrivialLoop(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - loop implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    public static int LogFloat(int v)
    {
        Helper h = new Helper() { U1 = v, U2 = 0x43300000 };
        h.D -= 4503599627370496.0;
        return (h.U2 >> 20) - 0x3FF;
    }

    public static void TestFloat()
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogFloat(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - IEEE float implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    [StructLayout(LayoutKind.Explicit)]
    private struct Helper
    {
        [FieldOffset(0)]
        public int U1;
        [FieldOffset(4)]
        public int U2;
        [FieldOffset(0)]
        public double D;
    }

    public static void TestConstant()
    {
        double c = 1.0 / Math.Log(2.0);
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += (int)(0.00000000001 + Math.Log(i) * c);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - naive 2 implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    private static int LogLookup(int i)
    {
        if (i >= 0x1000000) { return lookup[i >> 24] + 24; }
        else if (i >= 0x10000) { return lookup[i >> 16] + 16; }
        else if (i >= 0x100) { return lookup[i >> 8] + 8; }
        else { return lookup[i]; }
    }

    public static void TestLookup()
    {
        lookup = new int[256];
        for (int i = 1; i < 256; ++i)
        {
            lookup[i] = (int)(Math.Log(i) / Math.Log(2));
        }
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogLookup(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - table lookup implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    private static int LogDoubleLookup(int i)
    {
        int n = (i | (i >> 4));
        n = (n | (n >> 2));
        n = (n | (n >> 1));
        n = ((n & 0x1000000) >> 21) | ((n & 0x10000) >> 14) | ((n & 0x100) >> 7) | (n & 1);
        int br = nobranch[n];
        return lookup[i >> br] + br;
    }

    public static void TestDoubleLookup()
    {
        // Lookup table was already constructed earlier
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogDoubleLookup(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - double table lookup implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    private static int LogBinary(int v)
    {
        /* This is the worst implementation ever... - apparently C# is a slow-branching language

        int[] b = { 0x2, 0xC, 0xF0, 0xFF00, 0x7FFF0000 };
        int[] S = { 1, 2, 4, 8, 16 };

        int r = 0; // result of log2(v) will go here
        for (int i = 4; i >= 0; i--) // unroll for speed...
        {
            if ((v & b[i]) != 0)
            {
                v >>= S[i];
                r |= S[i];
            }
        }
        return r;

         */

        int r = (((v > 0xFFFF)) ? 0x10 : 0); 
        v >>= r;
        int shift = ((v > 0xFF) ? 0x8 : 0); 
        v >>= shift; 
        r |= shift;
        shift = ((v > 0xF) ? 0x4 : 0); 
        v >>= shift;
        r |= shift;
        shift = ((v > 0x3) ? 0x2 : 0); 
        v >>= shift;
        r |= shift;
        r |= (v >> 1);
        return r;
    }

    public static void TestBinary()
    {
        // Lookup table was already constructed earlier
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogBinary(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - binary search implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    private static readonly int[] MultiplyDeBruijnBitPosition = new int[32]
    {
        0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
        8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
    };

    private static int LogDeBruijn(int v)
    {
        v |= v >> 1; // first round down to one less than a power of 2 
        v |= v >> 2;
        v |= v >> 4;
        v |= v >> 8;
        v |= v >> 16;

        return MultiplyDeBruijnBitPosition[(uint)(v * 0x07C4ACDDU) >> 27];
    }

    public static void TestDeBruijn()
    {
        // Lookup table was already constructed earlier
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogDeBruijn(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - de Bruijn implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    private static int[] lookup;
    private static readonly int[] nobranch = new int[16] { 0, 0, 8, 8, 16, 16, 16, 16, 24, 24, 24, 24, 24, 24, 24, 24 };

    static void Main(string[] args)
    {
        TestConstant();
        TestNaive();
        TestDeBruijn();
        TestBinary();
        TestFloat();
        TestTrivialLoop();
        TestLookup();
        TestDoubleLookup();
        Console.ReadLine();
    }
}
like image 508
atlaste Avatar asked Apr 12 '13 09:04

atlaste


4 Answers

Took the binary solution already mentioned and removed the branching. Did some testing and it turned out to be 1.3 times faster than DeBruijn.

public static int Log2(int v)
{
    int r = 0xFFFF - v >> 31 & 0x10;
    v >>= r;
    int shift = 0xFF - v >> 31 & 0x8;
    v >>= shift; 
    r |= shift;
    shift = 0xF - v >> 31 & 0x4;
    v >>= shift;
    r |= shift;
    shift = 0x3 - v >> 31 & 0x2;
    v >>= shift;
    r |= shift;
    r |= (v >> 1);
    return r;
}
like image 180
DanielSig Avatar answered Nov 07 '22 07:11

DanielSig


There are some integer algorithms here.

In C#:

public static uint FloorLog2(uint x)
{
    x |= (x >> 1);
    x |= (x >> 2);
    x |= (x >> 4);
    x |= (x >> 8);
    x |= (x >> 16);

    return (uint)(NumBitsSet(x) - 1);
}

public static uint CeilingLog2(uint x)
{
    int y = (int)(x & (x - 1));

    y |= -y;
    y >>= (WORDBITS - 1);
    x |= (x >> 1);
    x |= (x >> 2);
    x |= (x >> 4);
    x |= (x >> 8);
    x |= (x >> 16);

    return (uint)(NumBitsSet(x) - 1 - y);
}

public static int NumBitsSet(uint x)
{
    x -= ((x >> 1) & 0x55555555);
    x = (((x >> 2) & 0x33333333) + (x & 0x33333333));
    x = (((x >> 4) + x) & 0x0f0f0f0f);
    x += (x >> 8);
    x += (x >> 16);

    return (int)(x & 0x0000003f);
}

private const int WORDBITS = 32;

You should look at the original code on the site I linked for the context, particularly what happens with Log2(0).

like image 25
Matthew Watson Avatar answered Nov 07 '22 06:11

Matthew Watson


For more algorithms look here http://www.asmcommunity.net/forums/topic/?id=15010

Also did some testing in C++ and my implementation of BSR is slower than lookup table

  • i am using BDS2006 there is probably slow down by state push/popping by asm directive
  • your lookup is fine but i am using 11 bits table instead of 8
  • it divides 32 bit into 3 branches instead of 4
  • and it is still small enough to handle without init function

code:

//---------------------------------------------------------------------------
DWORD log2_slow(const DWORD &x)
    {
    DWORD m,i;
    if (!x) return 0;
    if (x>=0x80000000) return 31;
    for (m=1,i=0;m<x;m<<=1,i++);
     if (m!=x) i--;
    return i;
    }
//---------------------------------------------------------------------------
DWORD log2_asm(const DWORD &x)
    {
    DWORD xx=x;
    asm {
        mov eax,xx
        bsr eax,eax;
        mov xx,eax;
        }
    return xx;
    }
//---------------------------------------------------------------------------
BYTE _log2[2048]=
    {
     0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
     7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
     8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
     8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
     9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
     9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
     9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
     9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    };
DWORD log2(const DWORD &x)
    {
         if (x>=0x00400000) return _log2[x>>22]+22;
    else if (x>=0x00000800) return _log2[x>>11]+11;
    else                    return _log2[x];
    }
//---------------------------------------------------------------------------

test code:

DWORD x,j,i,n=256;
tbeg(); for (i=0;i<32;i++) for (j=0;j<n;j++) x=log2     (j<<i); tend(); mm_log->Lines->Add(tstr(1));
tbeg(); for (i=0;i<32;i++) for (j=0;j<n;j++) x=log2_asm (j<<i); tend(); mm_log->Lines->Add(tstr(1));
tbeg(); for (i=0;i<32;i++) for (j=0;j<n;j++) x=log2_slow(j<<i); tend(); mm_log->Lines->Add(tstr(1));

my results on AMD A8-5500 3.2 GHz:

[   0.040 ms] log2     (x) - 11bit lookup table
[   0.060 ms] log2_asm (x) - BSR
[   0.415 ms] log2_slow(x) - shift loop

Note:

  • log2(0) -> 0 because of use of DWORDS, in real it should be -inf
  • all other values are correct for all functions
like image 2
Spektre Avatar answered Nov 07 '22 07:11

Spektre


inline int fast_log2(register double x)
{ 
    return (reinterpret_cast<uint64_t&>(x) >> 52) - 1023;
};
like image 2
user9337139 Avatar answered Nov 07 '22 07:11

user9337139