Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correlation coefficient on gnuplot

I want to plot data using fit function : function f(x) = a+b*x**2. After ploting i have this result:

correlation matrix of the fit parameters:

               m      n      
m               1.000 
n              -0.935  1.000 

My question is : how can i found a correlation coefficient on gnuplot ?

like image 810
Mehdi Avatar asked Dec 19 '12 17:12

Mehdi


4 Answers

You can use the stats command in gnuplot, which has syntax similar to the plot command:

stats "file.dat" using 2:(f($2)) name "A"

The correlation coefficient will be stored in the A_correlation variable. (With no name specification, it would be STATS_correlation.) You can use it subsequently to plot your data or just print on the screen using the set label command:

set label 1 sprintf("r = %4.2f",A_correlation) at graph 0.1, graph 0.85

You can find more about the stats command in gnuplot documentation.

like image 77
Nikita Rokotyan Avatar answered Oct 16 '22 11:10

Nikita Rokotyan


Although there is no direct solution to this problem, a workaround is possible. I'll illustrate it using python/numpy. First, the part of the gnuplot script that generates the fit and connects with a python script:

    file = "my_data.tsv"
    f(x)=a+b*(x)
    fit f(x) file using 2:3 via a,b
    r = system(sprintf("python correlation.py %s",file)) 
    ti = sprintf("y = %.2f + %.2fx (r = %s)", a, b, r)
    plot \
      file using 2:3 notitle,\
      f(x) title ti

This runs correlation.py to retrieve the correlation 'r' in string format. It uses 'r' to generate a title for the fit line. Then, correlation.py:

    from numpy import genfromtxt
    from numpy import corrcoef
    import sys
    data = genfromtxt(sys.argv[1], delimiter='\t')
    r = corrcoef(data[1:,1],data[1:,2])[0,1]
    print("%.3f" % r).lstrip('0')

Here, the first row is assumed to be a header row. Furthermore, the columns to calculate the correlation for are now hardcoded to nr. 1 and 2. Of course, both settings can be changed and turned into arguments as well.

The resulting title of the fit line is (for a personal example):

y = 2.15 + 1.58x (r = .592)
like image 43
Frans Avatar answered Oct 16 '22 09:10

Frans


Since you are probably using fit function you can first refer to this link to arrive at R2 values. The link uses certain existing variables like FIT_WSSR, FIT_NDF to calculate R2 value. The code for R2 is stated as:

SST = FIT_WSSR/(FIT_NDF+1)
SSE=FIT_WSSR/(FIT_NDF)
SSR=SST-SSE
R2=SSR/SST

The next step would be to show the R^2 values on the graph. Which can be achieved using the code :

set label 1 sprintf("r = %f",R2) at graph 0.7, graph 0.7

like image 3
Sai Avinash Sattiraju Avatar answered Oct 16 '22 09:10

Sai Avinash Sattiraju


If you're looking for a way to calculate the correlation coefficient as defined on this page, you are out of luck using gnuplot as explained in this Google Groups thread.

There are lots of other tools for calculating correlation coefficients, e.g. numpy.

like image 1
andyras Avatar answered Oct 16 '22 11:10

andyras