Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fit to a log scale [duplicate]

I have the following points

0 4194304
1 497420
2 76230
3 17220
4 3595
5 1697
6 491
7 184
8 54
9 15
10 4
11 4
12 1
13 1
14 1
15 1
16 1
17 1
18 1
19 1
20 1
21 1

If I plot them with a log scale on the y-axis they look roughly linear. How can I fit a straight line to this log scale so I can fit the data?

My current code is very crude. For each x,y pair I do.

xcoords.append(x)
ycoords.append(math.log(y))

And then at the end I do

plt.plot(xcoords,ycoords)
plt.show()
like image 461
marshall Avatar asked Nov 01 '22 09:11

marshall


1 Answers

This solution uses the least squares fitting method from numpy (docs).

This page provides an example usage of linear regression, on linear data.

Because you have log-linear data, then here we transform the data first, then run a linear fit.

import numpy as np
import matplotlib.pyplot as plt

d = '''
0 4194304
1 497420
 ... (put all the rest of the data in here)
'''

D = np.loadtxt(d.split('\n'))

x = D[:,0]
y = D[:,1]
y_ln = np.log(y)

n = D.shape[0]

A = np.array(([[x[j], 1] for j in range(n)]))
B = np.array(y_ln[0:n])

X = np.linalg.lstsq(A,B)[0]
a=X[0]; b=X[1]

# so now your fitted line is log(y) = a*x + b
# lets show it on a graph.
plt.figure()
plt.plot(x, a*x+b, '--')
plt.plot(x, y_ln, 'o')
plt.ylabel('log y')
plt.xlabel('x values')
plt.show()

# or use the original scales by transforming the data back again:

plt.figure()
plt.plot(x, np.exp(a*x+b), '--')
plt.plot(x, y, 'o')
plt.ylabel('y')
plt.xlabel('x values')
plt.yscale('log')
plt.show()

fitting all the data

However, your data seems to have two regimes, so a single linear fit doesn't well capture the data. You could instead describe it as two distinct regimes, which may or may not be appropriate depending on where your data comes from and whether you can explain the point at which the two regimes change.

So lets take the first part of your data and just fit that

n = 13
A = np.array(([[x[j], 1] for j in range(n)]))
B = np.array(yl[0:n])
A = np.array(([[x[j], 1] for j in range(n)]))
B = np.array(y_ln[0:n])

X = np.linalg.lstsq(A,B)[0]
a=X[0]; b=X[1]

plt.figure()
plt.plot(x[0:n], np.exp(a*x[0:n]+b), '--')
plt.plot(x, y, 'o')
plt.ylabel('y')
plt.xlabel('x values')
plt.yscale('log')
plt.show()

fitting part of the data

This is a better fit to the first part of the data (but it may not be particularly meaningful -- that depends on what process generated the data points).

like image 78
Bonlenfum Avatar answered Nov 15 '22 04:11

Bonlenfum