I have the following points
0 4194304
1 497420
2 76230
3 17220
4 3595
5 1697
6 491
7 184
8 54
9 15
10 4
11 4
12 1
13 1
14 1
15 1
16 1
17 1
18 1
19 1
20 1
21 1
If I plot them with a log scale on the y-axis they look roughly linear. How can I fit a straight line to this log scale so I can fit the data?
My current code is very crude. For each x,y pair I do.
xcoords.append(x)
ycoords.append(math.log(y))
And then at the end I do
plt.plot(xcoords,ycoords)
plt.show()
This solution uses the least squares fitting method from numpy (docs).
This page provides an example usage of linear regression, on linear data.
Because you have log-linear data, then here we transform the data first, then run a linear fit.
import numpy as np
import matplotlib.pyplot as plt
d = '''
0 4194304
1 497420
... (put all the rest of the data in here)
'''
D = np.loadtxt(d.split('\n'))
x = D[:,0]
y = D[:,1]
y_ln = np.log(y)
n = D.shape[0]
A = np.array(([[x[j], 1] for j in range(n)]))
B = np.array(y_ln[0:n])
X = np.linalg.lstsq(A,B)[0]
a=X[0]; b=X[1]
# so now your fitted line is log(y) = a*x + b
# lets show it on a graph.
plt.figure()
plt.plot(x, a*x+b, '--')
plt.plot(x, y_ln, 'o')
plt.ylabel('log y')
plt.xlabel('x values')
plt.show()
# or use the original scales by transforming the data back again:
plt.figure()
plt.plot(x, np.exp(a*x+b), '--')
plt.plot(x, y, 'o')
plt.ylabel('y')
plt.xlabel('x values')
plt.yscale('log')
plt.show()
However, your data seems to have two regimes, so a single linear fit doesn't well capture the data. You could instead describe it as two distinct regimes, which may or may not be appropriate depending on where your data comes from and whether you can explain the point at which the two regimes change.
So lets take the first part of your data and just fit that
n = 13
A = np.array(([[x[j], 1] for j in range(n)]))
B = np.array(yl[0:n])
A = np.array(([[x[j], 1] for j in range(n)]))
B = np.array(y_ln[0:n])
X = np.linalg.lstsq(A,B)[0]
a=X[0]; b=X[1]
plt.figure()
plt.plot(x[0:n], np.exp(a*x[0:n]+b), '--')
plt.plot(x, y, 'o')
plt.ylabel('y')
plt.xlabel('x values')
plt.yscale('log')
plt.show()
This is a better fit to the first part of the data (but it may not be particularly meaningful -- that depends on what process generated the data points).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With