Context: I've recently discovered the alglib library (for numerical computation), which seems to be the thing I was looking for (robust interpolation, data analysis...) and could not really find in numpy or scipy.
However, I'm concerned about the fact that (eg. for interpolation) it does not accept numpy array as valid input format, but only regular python list objects.
Problem: I've dug a bit into the code and documentation, and found (as expected) that this list format is just for transition, since the library will anyway convert it into ctypes (the cpython library is just an interface for the underlying C/C++ library).
That is where comes my concern: inside my code, I'm working with numpy arrays, because it is a big performance boost for the scientific calculations I'm performing on it. Thus I fear having to convert any data passed to alglib routines into list (which will be converted into ctypes) will have a huge impact on the performance (I'm working with arrays that could have hundreds of thousands floats inside, and with thousands of arrays).
Question: Do you think that I will indeed have a performance loss, or do you think I should start modifying the alglib code (only the python interface) so that it could accept numpy arrays, and make only one conversion (from numpy arrays to ctypes)? I don't even know if this is feasible, because it is quite a big library... Maybe you guys have better ideas or suggestions (even on similar but different libraries)...
It seems my problem is not getting a lot of interest, or that my question is not clear/relevant. Or maybe nobody has a solution or advice, but I doubt with so many experts around :) Anyway, I've written a small, quick and dirty test code to illustrate the problem...
#!/usr/bin/env python
import xalglib as al
import timeit
import numpy as np
def func(x):
return (3.14 *x**2.3 + x**3 -x**2.34 +x)/(1.+x)**2
def fa(x, y, val=3.14):
s = al.spline1dbuildakima(x, y)
return (al.spline1dcalc(s, val), func(val))
def fb(x, y, val=3.14):
_x = list(x)
_y = list(y)
s = al.spline1dbuildakima(_x, _y)
return (al.spline1dcalc(s, val), func(val))
ntot = 10000
maxi = 100
x = np.random.uniform(high=maxi, size=ntot)
y = func(x)
xl = list(x)
yl = list(y)
print "Test for len(x)=%d, and x between [0 and %.2f):" % (ntot, maxi)
print "Function: (3.14 *x**2.3 + x**3 -x**2.34 +x)/(1.+x)**2"
a, b = fa(xl, yl)
err = np.abs(a-b)/b * 100
print "(x=3.14) interpolated, exact =", (a, b)
print "(x=3.14) relative error should be <= 1e-2: %s (=%.2e)" % ((err <= 1e-2), err)
if __name__ == "__main__":
t = timeit.Timer(stmt="fa(xl, yl)", setup="from __main__ import fa, xl, yl, func")
tt = timeit.Timer(stmt="fb(x, y)", setup="from __main__ import fb, x, y, func")
v = 1000 * t.timeit(number=100)/100
vv = 1000 * tt.timeit(number=100)/100
print "%.2f usec/pass" % v
print "%.2f usec/pass" % vv
print "%.2f %% less performant using numpy arrays" % ((vv-v)/v*100.)
and running it, I'm getting:
"""
Test for len(x)=10000, and x between [0 and 100.00):
Function: (3.14 *x**2.3 + x**3 -x**2.34 +x)/(1.+x)**2
(x=3.14) interpolated, exact = (3.686727834705164, 3.6867278531266905)
(x=3.14) relative error should be <= 1e-2: True (=5.00e-07)
25.85 usec/pass
28.46 usec/pass
10.09 % less performant using numpy arrays
"""
Performance loss oscillates between about 8% and 14%, which is huge to me...
You can create you own wrap function that pass numpy array's data buffer to the vector's data pointer directly, this will not copy the data, and speedup your wrap function a lot. The following code pass x.ctypes.data to x_vector.ptr.p_ptr, where x is a numpy array.
when you pass numpy array, you must ensure that the array has it's elements in contiguous memory. The following code doen't check this.
import xalglib as al
import numpy as np
import ctypes
def spline1dbuildakima(x, y):
n = len(x)
_error_msg = ctypes.c_char_p(0)
__c = ctypes.c_void_p(0)
__n = al.c_ptrint_t(n)
__x = al.x_vector(cnt=n, datatype=al.DT_REAL, owner=al.OWN_CALLER,
last_action=0,ptr=al.x_multiptr(p_ptr=x.ctypes.data))
__y = al.x_vector(cnt=n, datatype=al.DT_REAL, owner=al.OWN_CALLER,
last_action=0,ptr=al.x_multiptr(p_ptr=y.ctypes.data))
al._lib_alglib.alglib_spline1dbuildakima(
ctypes.byref(_error_msg),
ctypes.byref(__x),
ctypes.byref(__y),
ctypes.byref(__n),
ctypes.byref(__c))
__r__c = al.spline1dinterpolant(__c)
return __r__c
def func(x):
return (3.14 *x**2.3 + x**3 -x**2.34 +x)/(1.+x)**2
def fa(x, y, val=3.14):
s = spline1dbuildakima(x, y)
return al.spline1dcalc(s, val), func(val)
def fb(x, y, val=3.14):
s = al.spline1dbuildakima(x, y)
return al.spline1dcalc(s, val), func(val)
ntot = 10000
maxi = 100
x = np.random.uniform(high=maxi, size=ntot)
y = func(x)
xl = list(x)
yl = list(y)
import time
start = time.clock()
for i in xrange(100):
a, b = fa(x, y)
print time.clock()-start
err = np.abs(a-b)/b * 100
print a, b, err
start = time.clock()
for i in xrange(100):
a, b = fb(xl, yl)
print time.clock()-start
err = np.abs(a-b)/b * 100
print a, b, err
The output is:
0.722314760822 <- seconds of numpy array version
3.68672728107 3.68672785313 1.55166878281e-05
3.22011891502 <- seconds of list version
3.68672728107 3.68672785313 1.55166878281e-05
Making the C++ alglib accept NumPy arrays is certainly doable: SciPy does this. The question is really how difficult it is. You might want to try one of the semi-automatic C++ → Python wrapping program, like (starting with the one I would start with–warning: I'm no expert):
On a different subject: I have used interpolation splines in SciPy with success, in the past. I am not sure that this would be sufficient for your needs, though, since you did not find everything you wanted in SciPy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With