Let's say I have a simple data set. Perhaps in dictionary form, it would look like this:
{1:5, 2:10, 3:15, 4:20, 5:25}
(the order is always ascending).
What I want to do is logically figure out what the next point of data is most likely to be. In the case, for example, it would be {6: 30}
what would be the best way to do this?
To do this, the researcher plots out a linear equation on a graph and uses the sequence of the values to predict immediate future data points. You can draw a tangent line at the last point and extend this line beyond its limits.
Interpolation refers to the process of generating data points between already existing data points. Extrapolation is the process of generating points outside a given set of known data points.
First, separate x and y points. Then we can use np. polyfit to fit a line to these points. A straight line can be represented with y = mx + b which is a polynomial of degree 1 .
You can also use numpy's polyfit:
data = np.array([[1,5], [2,10], [3,15], [4,20], [5,25]])
fit = np.polyfit(data[:,0], data[:,1] ,1) #The use of 1 signifies a linear fit.
fit
[ 5.00000000e+00 1.58882186e-15] #y = 5x + 0
line = np.poly1d(fit)
new_points = np.arange(5)+6
new_points
[ 6, 7, 8, 9, 10]
line(new_points)
[ 30. 35. 40. 45. 50.]
This allows you to alter the degree of the polynomial fit quite easily as the function polyfit
take thes following arguments np.polyfit(x data, y data, degree)
. Shown is a linear fit where the returned array looks like fit[0]*x^n + fit[1]*x^(n-1) + ... + fit[n-1]*x^0
for any degree n
. The poly1d
function allows you turn this array into a function that returns the value of the polynomial at any given value x
.
In general extrapolation without a well understood model will have sporadic results at best.
Exponential curve fitting.
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.exp(-b * x) + c
x = np.linspace(0,4,5)
y = func(x, 2.5, 1.3, 0.5)
yn = y + 0.2*np.random.normal(size=len(x))
fit ,cov = curve_fit(func, x, yn)
fit
[ 2.67217435 1.21470107 0.52942728] #Variables
y
[ 3. 1.18132948 0.68568395 0.55060478 0.51379141] #Original data
func(x,*fit)
[ 3.20160163 1.32252521 0.76481773 0.59929086 0.5501627 ] #Fit to original + noise
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With