Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

approximating log10[x^k0 + k1]

Greetings. I'm trying to approximate the function

Log10[x^k0 + k1], where .21 < k0 < 21, 0 < k1 < ~2000, and x is integer < 2^14.

k0 & k1 are constant. For practical purposes, you can assume k0 = 2.12, k1 = 2660. The desired accuracy is 5*10^-4 relative error.

This function is virtually identical to Log[x], except near 0, where it differs a lot.

I already have came up with a SIMD implementation that is ~1.15x faster than a simple lookup table, but would like to improve it if possible, which I think is very hard due to lack of efficient instructions.

My SIMD implementation uses 16bit fixed point arithmetic to evaluate a 3rd degree polynomial (I use least squares fit). The polynomial uses different coefficients for different input ranges. There are 8 ranges, and range i spans (64)2^i to (64)2^(i + 1). The rational behind this is the derivatives of Log[x] drop rapidly with x, meaning a polynomial will fit it more accurately since polynomials are an exact fit for functions that have a derivative of 0 beyond a certain order.

SIMD table lookups are done very efficiently with a single _mm_shuffle_epi8(). I use SSE's float to int conversion to get the exponent and significand used for the fixed point approximation. I also software pipelined the loop to get ~1.25x speedup, so further code optimizations are probably unlikely.

What I'm asking is if there's a more efficient approximation at a higher level? For example:

  1. Can this function be decomposed into functions with a limited domain like log2((2^x) * significand) = x + log2(significand)

hence eliminating the need to deal with different ranges (table lookups). The main problem I think is adding the k1 term kills all those nice log properties that we know and love, making it not possible. Or is it?

  1. Iterative method? don't think so because the Newton method for log[x] is already a complicated expression

  2. Exploiting locality of neighboring pixels? - if the range of the 8 inputs fall in the same approximation range, then I can look up a single coefficient, instead of looking up separate coefficients for each element. Thus, I can use this as a fast common case, and use a slower, general code path when it isn't. But for my data, the range needs to be ~2000 before this property hold 70% of the time, which doesn't seem to make this method competitive.

Please, give me some opinion, especially if you're an applied mathematician, even if you say it can't be done. Thanks.

like image 750
Yale Zhang Avatar asked Jan 16 '11 03:01

Yale Zhang


2 Answers

You should be able to improve on least-squares fitting by using Chebyshev approximation. (The idea is, you're looking for the approximation whose worst-case deviation in a range is least; least-squares instead looks for the one whose summed squared difference is least.) I would guess this doesn't make a huge difference for your problem, but I'm not sure -- hopefully it could reduce the number of ranges you need to split into, somewhat.

If there's already a fast implementation of log(x), maybe compute P(x) * log(x) where P(x) is a polynomial chosen by Chebyshev approximation. (Instead of trying to do the whole function as a polynomial approx -- to need less range-reduction.)

I'm an amateur here -- just dipping my toe in as there aren't a lot of answers already.

like image 159
Darius Bacon Avatar answered Nov 10 '22 15:11

Darius Bacon


One observation: You can find an expression for how large x needs to be as a function of k0 and k1, such that the term x^k0 dominates k1 enough for the approximation:

x^k0 +k1 ~= x^k0, allowing you to approximately evaluate the function as

k0*Log(x).

This would take care of all x's above some value.

like image 43
Anonymous observer Avatar answered Nov 10 '22 15:11

Anonymous observer