Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to determine if a black-box is polynomial or exponential

I have a problem which essentially reduces to this:

  1. You have a black-box function that accepts inputs of length n.
  2. You can measure the amount of time the function takes to return the answer, but you can't see exactly how it was calculated.
  3. You have to determine whether the time-complexity of this function is polynomial or exponential.

The way I did this was by running thousands of random sample inputs of varying lengths through the function, then plotting them on a scatter plot with times on the y-axis and input length on the x-axis.

I have the scatter plotted using numpy, but now how do I draw polynomial and exponential best fit lines? And what metrics can I calculate to determine which best fit lines fit best?

(Asked a similar question but theoretically and no emphasis on a particular language on Computer Science: https://cs.stackexchange.com/questions/23686/how-to-determine-if-a-black-box-is-polynomial-or-exponential)

like image 968
stevenheidel Avatar asked Apr 12 '14 04:04

stevenheidel


2 Answers

Take the logarithm of all your Y values. The polynomial results will still expose logarithmical growth (since log x^n = n * log x), but the exponential curve will transform into a proper straight line (log exp(x) = x).

If you now approximate enough of the result points with linear LSQ, then you can be pretty sure that the algorithm is exponential if it fits nicely (allow some difference to be present, though -- I suggest deducing some reasonable epsilon empirically by examining an algorithm that you know the complexity of!), and polynomial otherwise.

You can also raise the confidence level a bit by examining the deviation from the regression line: if most of the values are below it, then the data was most probably logarithmic (i. e. the program ran in polynomial time).

like image 131
The Paramagnetic Croissant Avatar answered Nov 09 '22 20:11

The Paramagnetic Croissant


If the time cost t of your algorithm is polynomial in terms of size n, i.e. approximates t = k*nm, then a log-log plot, a plot of log t against log n should be a straight line with gradient m (and intercept log k).

If the time cost t of your algorithm is exponential in terms of size n, i.e. approximates t = kemn, then a log-linear plot, a plot of log t against n should be a straight line with gradient m (and intercept log k).

Because these plots are straight lines you can use simple linear regression to estimate the parameters.

how do I draw polynomial and exponential best fit lines?

Once you have the parameters, the gradient m and intercept c of the straight line fit in log-log or semi-log space, you can transform m and c back to your original coordinate space to get the best fit curve in terms of your original data (the details are shown in the example program below).

what metrics can I calculate to determine which best fit lines fit best?

The R2 of the regression gives a numerical indication of goodness of fit, with closer to 1 being a better fit. However this can be a bit misleading because it reflects the fit in the transformed space (log t vs log n or log t vs n), so it's easier just to look at the resulting equations and eyeball the curves. An alternative quantitative measure would be to sum the errors between your original data and the corresponding points on the best fit curve.

This technique is demonstrated with the following data series (each of which incorporates noise)

  • Series 0: t = 3n
  • Series 1: t = 5n2
  • Series 2: t = 7en
  • Series 3: t = 9e2n

In the plot below, you can see that the polynomial form has been recovered for series 0 and 1 (green dashed line), and the exponential form recovered for series 2 and 3 (red dashed line). In each case higher R2s correspond to the best fit, and the parameter m has been recovered quite accurately.

enter image description here

The code is below

import math, numpy as np, matplotlib.pyplot as plt
from sklearn import linear_model

N = 100
lo = 2
hi = 25
n = np.linspace(lo,hi,num=N)
D = 4
T = np.maximum(np.random.normal(scale=hi / 25.0, size=(N,D)),0.01)
T[:,0] = 3*(T[:,0] + n)
T[:,1] = 5*(T[:,1] + n)**2
T[:,2] = 7*np.exp(T[:,2]/5.0 + n)
T[:,3] = 9*np.exp(2*(T[:,3]/5.0 + n))

logn = np.log(n)
logT = np.log(T)

for i in range(D):
    # fit log-log model
    loglogmod = linear_model.LinearRegression()
    x=np.reshape(logn,(N,1))
    y=logT[:,i]
    loglogmod.fit(x, y)
    loglogmod_rsquared = loglogmod.score(x, y)
    # fit log model
    logmod = linear_model.LinearRegression()
    x=np.reshape(n,(N,1))
    logmod.fit(x, y)
    logmod_rsquared = logmod.score(x, y)
    # plot results
    plt.subplot(2,2,i+1)
    plt.plot(n, T[:,i], label='series {}'.format(i),lw=2)
    m = loglogmod.coef_[0]
    c = loglogmod.intercept_
    polynomial = math.exp(c)*np.power(n,m)
    plt.plot(n, polynomial,
                label='$t={0:.1f}n^{{{1:.1f}}}$ ($r^2={2:.2f}$)'.format(math.exp(c),m,loglogmod_rsquared),
                ls='dashed',lw=3)
    m = logmod.coef_[0]
    c = logmod.intercept_
    exponential = np.exp(n * m) * math.exp(c)
    plt.plot(n, exponential,
                label='$t={0:.1f}e^{{{1:.1f}n}}$ ($r^2={2:.2f}$)'.format(math.exp(c),m,logmod_rsquared),
                ls='dashed',lw=3)
    plt.legend(loc='upper left', prop={'size':16})

plt.show()

(documentation that I used: sklearn.linear_model.LinearRegression reference and matplotlib tutorial)

like image 2
TooTone Avatar answered Nov 09 '22 20:11

TooTone