Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fitting a straight line to a log-log curve in matplotlib

I have a plot with me which is logarithmic on both the axes. I have pyplot's loglog function to do this. It also gives me the logarithmic scale on both the axes.

Now, using numpy I fit a straight line to the set of points that I have. However, when I plot this line on the plot, I cannot get a straight line. I get a curved line. The blue line is the supposedly "straight line". It is not getting plotted straight. I want to fit a straight line to the curve plotted by red dots

The blue line is the supposedly "straight line". It is not getting plotted straight. I want to fit this straight line to the curve plotted by red dots

Here is the code I am using to plot the points:

import numpy
from matplotlib import pyplot as plt
import math
fp=open("word-rank.txt","r")
a=[]
b=[]

for line in fp:
    string=line.strip().split()
    a.append(float(string[0]))
    b.append(float(string[1]))

coefficients=numpy.polyfit(b,a,1)
polynomial=numpy.poly1d(coefficients)
ys=polynomial(b)
print polynomial
plt.loglog(b,a,'ro')
plt.plot(b,ys)
plt.xlabel("Log (Rank of frequency)")
plt.ylabel("Log (Frequency)")
plt.title("Frequency vs frequency rank for words")
plt.show()
like image 691
Rohit Shinde Avatar asked Jun 05 '15 02:06

Rohit Shinde


People also ask

What does a straight line on a log-log plot mean?

The slope of a log-log plot gives the power of the relationship, and a straight line is an indication that a definite power relationship exists.

Why do these log log plots not appear to be straight lines the whole way?

Why? On a log-log or semi-log plot, a Linear curve fit may not appear as a straight line. This is because the divisions along a log scale are not equally spaced.

How do you change the log scale in Matplotlib?

The method yscale() or xscale() takes a single value as a parameter which is the type of conversion of the scale, to convert axes to logarithmic scale we pass the “log” keyword or the matplotlib. scale. LogScale class to the yscale or xscale method.


2 Answers

To better understand this problem, let's first talk about plain ol' linear regression (the polyfit function, in this case, is your linear regression algorithm).


Suppose you have a set of data points (x,y), shown below:

Linear regression diagram

You want to create a model that predicts y as a function of x, so you use linear regression. That uses the model:

y = mx + b

and computes the values of m and b that best predict your data, using some linear algebra.

Next, you use your model to predict values of y as a function of x. You do this by picking a set of values for x (think linspace) and computing the corresponding values of y. Plotting these (x,y) pairs gives you your regression line.


Now, let's talk about logarithmic regression. In this case, we still have two variables, y versus x, and we are still interested in their relationship, i.e., being able to predict y given x. The only difference is, now y and x happen to be logarithms of two other variables, which I'll call log(F) and log(R). Thus far, this is nothing more than a simple change of name.

Log regression diagram

The linear regression also works the same way. You're still regressing y versus x. The linear regression algorithm doesn't care that y and x are actually log(F) and log(R) - it makes no difference to the algorithm.

The last step is a little bit different - and this is where you're getting tripped up in your plot above. What you're doing is computing

F = m R + b

but this is incorrect, because the relationship between F and R is not linear. (That's why you're using a log-log plot.)

Instead, you should compute

log(F) = m log(R) + b

If you transform this (raise 10 to the power of both sides and rearrange), you get

F = c R^m

where c = 10^b. This is the relationship between F and R: it is a power law relationship. (Power law relationships are what log-log plots are best at.)

In your code, you're using A and B when calling polyfit, but you should be using log(A) and log(B).

like image 50
charlesreid1 Avatar answered Oct 13 '22 23:10

charlesreid1


Your linear fit is not performed on the same data as shown in the loglog-plot.

Make a and b numpy arrays like this

a = numpy.asarray(a, dtype=float)
b = numpy.asarray(b, dtype=float)

Now you can perform operations on them. What the loglog-plot does, is to take the logarithm to base 10 of both a and b. You can do the same by

logA = numpy.log10(a)
logB = numpy.log10(b)

This is what the loglog plot visualizes. Check this by ploting both logA and logB as a regular plot. Repeat the linear fit on the log data and plot your line in the same plot as the logA, logB data.

coefficients = numpy.polyfit(logB, logA, 1)
polynomial = numpy.poly1d(coefficients)
ys = polynomial(b)
plt.plot(logB, logA)
plt.plot(b, ys)
like image 5
trueter Avatar answered Oct 14 '22 00:10

trueter