For an educational website, it is my purpose to let students fool around somewhat with value series and their collelation. For instance, students can enter two arrays for which the correlation is calculated:
$array_x = array(5,3,6,7,4,2,9,5);
$array_y = array(4,3,4,8,3,2,10,5);
echo Correlation($array_x, $array_y); // 0.93439982209434
The code for this works perfectly and can be found at the bottom of this post. I'm however now facing a challenge. What I want is the following:
So, in other words, the code would have to work like:
$array_x = array(5,3,6,7,4,2,9,5);
$boundaries = array(1, 10);
$correlation = 0.9;
echo ySeries($array_x, $boundaries, $correlation); // array(4,3,4,8,3,2,10,5)
At the Stackexchange Math forum, @ilya answered (inserted as an image, since Latex formatting of fomulas don't seem to work on stackoverflow):
P.S. The code used to calculate the correlation:
function Correlation($arr1, $arr2) {
$correlation = 0;
$k = SumProductMeanDeviation($arr1, $arr2);
$ssmd1 = SumSquareMeanDeviation($arr1);
$ssmd2 = SumSquareMeanDeviation($arr2);
$product = $ssmd1 * $ssmd2;
$res = sqrt($product);
$correlation = $k / $res;
return $correlation;
}
function SumProductMeanDeviation($arr1, $arr2) {
$sum = 0;
$num = count($arr1);
for($i=0; $i < $num; $i++) {
$sum = $sum + ProductMeanDeviation($arr1, $arr2, $i);
}
return $sum;
}
function ProductMeanDeviation($arr1, $arr2, $item) {
return (MeanDeviation($arr1, $item) * MeanDeviation($arr2, $item));
}
function SumSquareMeanDeviation($arr) {
$sum = 0;
$num = count($arr);
for($i = 0; $i < $num; $i++) {
$sum = $sum + SquareMeanDeviation($arr, $i);
}
return $sum;
}
function SquareMeanDeviation($arr, $item) {
return MeanDeviation($arr, $item) * MeanDeviation($arr, $item);
}
function SumMeanDeviation($arr) {
$sum = 0;
$num = count($arr);
for($i = 0; $i < $num; $i++) {
$sum = $sum + MeanDeviation($arr, $i);
}
return $sum;
}
function MeanDeviation($arr, $item) {
$average = Average($arr);
return $arr[$item] - $average;
}
function Average($arr) {
$sum = Sum($arr);
$num = count($arr);
return $sum/$num;
}
function Sum($arr) {
return array_sum($arr);
}
So, here's the php implementation of your algorithm that uses Dawkins' weasel to reduce the error gradually until the desired threshold.
<?php
function sqrMeanDeviation($array, $avg)
{
$sqrMeanDeviation = 0;
for($i=0; $i<count($array); $i++)
{
$dev = $array[$i] - $avg;
$sqrMeanDeviation += $dev * $dev;
}
return $sqrMeanDeviation;
}
// z values are non-0 an can value between [-abs_z_bound, abs_z_bound]
function random_z_element($abs_z_bound = 1)
{
$a = (mt_rand() % (2*$abs_z_bound) ) - ($abs_z_bound-1);
if($a <= 0)
$a--;
return $a;
}
// change z a little
function copy_z_weasel($old_array_z, $error_probability = 20 /*error possible is 1 in error_probability*/, $abs_z_bound = 1)
{
$new_z = array();
for($i = 0; $i < count($old_array_z); $i++)
if(mt_rand() % $error_probability == 0 )
$new_z[$i] = random_z_element($abs_z_bound);
else
$new_z[$i] = $old_array_z[$i];
return $new_z;
}
function correlation_error($array_y, $array_x, $avg_x, $sqrMeanDeviation_x, $correlation)
{
// checking correlation
$avg_y = array_sum($array_y)/count($array_y);
$sqrMeanDeviation_y = 0;
$covariance_xy = 0;
for($i=0; $i<count($array_x); $i++)
{
$dev_y = $array_y[$i] - $avg_y;
$sqrMeanDeviation_y += $dev_y * $dev_y;
$dev_x = $array_x[$i] - $avg_x;
$covariance_xy += $dev_y * $dev_x;
}
$correlation_xy = $covariance_xy/sqrt($sqrMeanDeviation_x*$sqrMeanDeviation_y);
return abs($correlation_xy - $correlation);
}
function ySeries($array_x, $low_bound, $high_bound, $correlation, $threshold)
{
$array_y = array();
$avg_x = array_sum($array_x)/count($array_x);
$sqrMeanDeviation_x = sqrMeanDeviation($array_x, $avg_x);
// pre-compute beta
$beta_x_sQMz = $sqrMeanDeviation_x * sqrt( 1 / ($correlation*$correlation) - 1 );
$best_array_z = array();
$n = 0;
$error = $threshold + 1;
while($error > $threshold)
{
++$n;
// generate z
$array_z = array();
if(count($best_array_z) == 0)
for($i=0; $i<count($array_x); $i++)
$array_z[$i] = random_z_element();
else
$array_z = copy_z_weasel($best_array_z);
$sqm_z = sqrMeanDeviation($array_z, array_sum($array_z)/count($array_z) );
// this being 0 implies that for every beta correlation(x,y) = 1 so just give it any random beta
if($sqm_z)
$beta = $beta_x_sQMz / $sqm_z;
else
$beta = 10;
// and now we have y
for($i=0; $i<count($array_x); $i++)
$array_y[$i] = $array_x[$i] + ($array_z[$i] * $beta);
// now, change bounds (we could do this afterwards but we want precision and y to be integers)
// rounding
$min_y = $array_y[0];
$max_y = $array_y[0];
for( $i=1; $i<count($array_x); $i++ )
{
if($array_y[$i] < $min_y)
$min_y = $array_y[$i];
if($array_y[$i] > $max_y)
$max_y = $array_y[$i];
}
$range = ($high_bound - $low_bound) / ($max_y - $min_y);
$shift = $low_bound - $min_y;
for( $i=0; $i<count($array_x); $i++ )
$array_y[$i] = round($array_y[$i] * $range + $shift);
// get the error
$new_error = correlation_error($array_y, $array_x, $avg_x, $sqrMeanDeviation_x, $correlation);
if($new_error < $error)
{
$best_array_z = $array_z;
$error = $new_error;
}
}
echo "Correlation ", $correlation, " approched within " , $new_error, " in ", $n ," iterations.\n";
return $array_y;
}
?>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With