I have a problem which essentially reduces to this:
The way I did this was by running thousands of random sample inputs of varying lengths through the function, then plotting them on a scatter plot with times on the y-axis and input length on the x-axis.
I have the scatter plotted using numpy, but now how do I draw polynomial and exponential best fit lines? And what metrics can I calculate to determine which best fit lines fit best?
(Asked a similar question but theoretically and no emphasis on a particular language on Computer Science: https://cs.stackexchange.com/questions/23686/how-to-determine-if-a-black-box-is-polynomial-or-exponential)
Take the logarithm of all your Y values. The polynomial results will still expose logarithmical growth (since log x^n = n * log x
), but the exponential curve will transform into a proper straight line (log exp(x) = x
).
If you now approximate enough of the result points with linear LSQ, then you can be pretty sure that the algorithm is exponential if it fits nicely (allow some difference to be present, though -- I suggest deducing some reasonable epsilon empirically by examining an algorithm that you know the complexity of!), and polynomial otherwise.
You can also raise the confidence level a bit by examining the deviation from the regression line: if most of the values are below it, then the data was most probably logarithmic (i. e. the program ran in polynomial time).
If the time cost t of your algorithm is polynomial in terms of size n, i.e. approximates t = k*nm, then a log-log plot, a plot of log t against log n should be a straight line with gradient m (and intercept log k).
If the time cost t of your algorithm is exponential in terms of size n, i.e. approximates t = kemn, then a log-linear plot, a plot of log t against n should be a straight line with gradient m (and intercept log k).
Because these plots are straight lines you can use simple linear regression to estimate the parameters.
how do I draw polynomial and exponential best fit lines?
Once you have the parameters, the gradient m and intercept c of the straight line fit in log-log or semi-log space, you can transform m and c back to your original coordinate space to get the best fit curve in terms of your original data (the details are shown in the example program below).
what metrics can I calculate to determine which best fit lines fit best?
The R2 of the regression gives a numerical indication of goodness of fit, with closer to 1 being a better fit. However this can be a bit misleading because it reflects the fit in the transformed space (log t vs log n or log t vs n), so it's easier just to look at the resulting equations and eyeball the curves. An alternative quantitative measure would be to sum the errors between your original data and the corresponding points on the best fit curve.
This technique is demonstrated with the following data series (each of which incorporates noise)
In the plot below, you can see that the polynomial form has been recovered for series 0 and 1 (green dashed line), and the exponential form recovered for series 2 and 3 (red dashed line). In each case higher R2s correspond to the best fit, and the parameter m has been recovered quite accurately.
The code is below
import math, numpy as np, matplotlib.pyplot as plt
from sklearn import linear_model
N = 100
lo = 2
hi = 25
n = np.linspace(lo,hi,num=N)
D = 4
T = np.maximum(np.random.normal(scale=hi / 25.0, size=(N,D)),0.01)
T[:,0] = 3*(T[:,0] + n)
T[:,1] = 5*(T[:,1] + n)**2
T[:,2] = 7*np.exp(T[:,2]/5.0 + n)
T[:,3] = 9*np.exp(2*(T[:,3]/5.0 + n))
logn = np.log(n)
logT = np.log(T)
for i in range(D):
# fit log-log model
loglogmod = linear_model.LinearRegression()
x=np.reshape(logn,(N,1))
y=logT[:,i]
loglogmod.fit(x, y)
loglogmod_rsquared = loglogmod.score(x, y)
# fit log model
logmod = linear_model.LinearRegression()
x=np.reshape(n,(N,1))
logmod.fit(x, y)
logmod_rsquared = logmod.score(x, y)
# plot results
plt.subplot(2,2,i+1)
plt.plot(n, T[:,i], label='series {}'.format(i),lw=2)
m = loglogmod.coef_[0]
c = loglogmod.intercept_
polynomial = math.exp(c)*np.power(n,m)
plt.plot(n, polynomial,
label='$t={0:.1f}n^{{{1:.1f}}}$ ($r^2={2:.2f}$)'.format(math.exp(c),m,loglogmod_rsquared),
ls='dashed',lw=3)
m = logmod.coef_[0]
c = logmod.intercept_
exponential = np.exp(n * m) * math.exp(c)
plt.plot(n, exponential,
label='$t={0:.1f}e^{{{1:.1f}n}}$ ($r^2={2:.2f}$)'.format(math.exp(c),m,logmod_rsquared),
ls='dashed',lw=3)
plt.legend(loc='upper left', prop={'size':16})
plt.show()
(documentation that I used: sklearn.linear_model.LinearRegression
reference and matplotlib
tutorial)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With