I want to plot data using fit function : function f(x) = a+b*x**2
. After ploting i have this result:
correlation matrix of the fit parameters:
m n
m 1.000
n -0.935 1.000
My question is : how can i found a correlation coefficient on gnuplot ?
You can use the stats
command in gnuplot, which has syntax similar to the plot
command:
stats "file.dat" using 2:(f($2)) name "A"
The correlation coefficient will be stored in the A_correlation
variable. (With no name specification, it would be STATS_correlation
.) You can use it subsequently to plot your data or just print on the screen using the set label
command:
set label 1 sprintf("r = %4.2f",A_correlation) at graph 0.1, graph 0.85
You can find more about the stats
command in gnuplot documentation.
Although there is no direct solution to this problem, a workaround is possible. I'll illustrate it using python/numpy. First, the part of the gnuplot script that generates the fit and connects with a python script:
file = "my_data.tsv"
f(x)=a+b*(x)
fit f(x) file using 2:3 via a,b
r = system(sprintf("python correlation.py %s",file))
ti = sprintf("y = %.2f + %.2fx (r = %s)", a, b, r)
plot \
file using 2:3 notitle,\
f(x) title ti
This runs correlation.py to retrieve the correlation 'r' in string format. It uses 'r' to generate a title for the fit line. Then, correlation.py:
from numpy import genfromtxt
from numpy import corrcoef
import sys
data = genfromtxt(sys.argv[1], delimiter='\t')
r = corrcoef(data[1:,1],data[1:,2])[0,1]
print("%.3f" % r).lstrip('0')
Here, the first row is assumed to be a header row. Furthermore, the columns to calculate the correlation for are now hardcoded to nr. 1 and 2. Of course, both settings can be changed and turned into arguments as well.
The resulting title of the fit line is (for a personal example):
y = 2.15 + 1.58x (r = .592)
Since you are probably using fit function you can first refer to this link to arrive at R2 values. The link uses certain existing variables like FIT_WSSR, FIT_NDF to calculate R2 value. The code for R2 is stated as:
SST = FIT_WSSR/(FIT_NDF+1)
SSE=FIT_WSSR/(FIT_NDF)
SSR=SST-SSE
R2=SSR/SST
The next step would be to show the R^2 values on the graph. Which can be achieved using the code :
set label 1 sprintf("r = %f",R2) at graph 0.7, graph 0.7
If you're looking for a way to calculate the correlation coefficient as defined on this page, you are out of luck using gnuplot as explained in this Google Groups thread.
There are lots of other tools for calculating correlation coefficients, e.g. numpy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With