In a MySQL (5.1) database table there is data that represents:
Would MySQL support correlating the data or do I need to use PHP/C# to calcuate?
Where would I find a good formula to calculate correlation (it's been a long time since I last did this)?
Correlation in SQLCalculating the correlation coefficient in SQL is pretty straightforward. The corr(Y, X) function returns the correlation coefficient between a set of variables.
The correlation coefficient is determined by dividing the covariance by the product of the two variables' standard deviations. Standard deviation is a measure of the dispersion of data from its average. Covariance is a measure of how two variables change together.
The Correlation function calculates the correlation coefficient of two pairs of values by first evaluating the specified set against the first numeric expression to obtain the values for the y-axis.
There are two flavors of the Pearson correlation coefficient, one for a Sample and one for an entire Population. These are single-pass and, I believe, correct formulas for both:
-- Methods for calculating the two Pearson correlation coefficients
SELECT
-- For Population
(avg(x * y) - avg(x) * avg(y)) /
(sqrt(avg(x * x) - avg(x) * avg(x)) * sqrt(avg(y * y) - avg(y) * avg(y)))
AS correlation_coefficient_population,
-- For Sample
(count(*) * sum(x * y) - sum(x) * sum(y)) /
(sqrt(count(*) * sum(x * x) - sum(x) * sum(x)) * sqrt(count(*) * sum(y * y) - sum(y) * sum(y)))
AS correlation_coefficient_sample
FROM your_table;
I developed and tested this as T-SQL. The code that generated the test data didn't translate to MySQL but the formulas should. Make sure your x and y are decimals values; integer math can significantly impact these calcs.
Here's a rough implementation of the sample correlation coefficient as described in:
Wikipedia - Correlation and Dependence
create table sample( x float not null, y float not null );
insert into sample values (1, 10), (2, 4), (3, 5), (6,17);
select @ax := avg(x),
@ay := avg(y),
@div := (stddev_samp(x) * stddev_samp(y))
from sample;
select sum( ( x - @ax ) * (y - @ay) ) / ((count(x) -1) * @div) from sample;
+---------------------------------------------------------+
| sum( ( x - @ax ) * (y - @ay) ) / ((count(x) -1) * @div) |
+---------------------------------------------------------+
| 0.700885077729073 |
+---------------------------------------------------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With