Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pearson Correlation SQL Server

Tags:

sql

sql-server

I have two tables:

ID,YRMO,Counts

1,Dec 2013,4

1,Jan 2014,6

1,Feb 2014,7

2,Jan,2014,6

2,Feb,2014,8

ID,YRMO,Counts

1,Dec 2013,10

1,Jan 2014,8

1,March 2014,12

2,Jan 2014,6

2,Feb 2014,10

I want to find the pearson corelation coefficient for each sets of ID. There are about more than 200 different IDS.

Pearson correlation is a measure of the linear correlation (dependence) between two variables X and Y, giving a value between +1 and −1 inclusive

More can be found here :http://oreilly.com/catalog/transqlcook/chapter/ch08.html at calculating correlation section

like image 959
user3325141 Avatar asked Jun 28 '26 19:06

user3325141


1 Answers

To calculate Pearson Correlation Coefficient; you need to first calculate Mean then standard daviation and then correlation coefficient as outlined below

1. Calculate Mean

insert into tab2 (tab1_id, mean)
select ID, sum([counts]) / 
(select count(*) from tab1) as mean
from tab1
group by ID;

2. Calculate standard deviation

update tab2
set stddev = (
select sqrt(
sum([counts] * [counts]) / 
(select count(*) from tab1)
- mean * mean
) stddev
from tab1
where tab1.ID = tab2.tab1_id
group by tab1.ID);

3. Finally Pearson Correlation Coefficient

select ID,
((sf.sum1 / (select count(*) from tab1)
- stats1.mean * stats2.mean
)
/ (stats1.stddev * stats2.stddev)) as PCC
from (
select r1.ID,
 sum(r1.[counts] * r2.[counts]) as sum1
from tab1 r1
join tab1 r2
on r1.ID = r2.ID
group by r1.ID
) sf
join tab2 stats1
on stats1.tab1_id = sf.ID
join tab2 stats2
on stats2.tab1_id = sf.ID

Which on your posted data results in

enter image description here

See a demo fiddle here http://sqlfiddle.com/#!3/0da20/5

EDIT:

Well refined a bit. You can use the below function to get PCC but I am not getting exact same result as of your but rather getting 0.999996000000000 for ID = 1.

This could be a great entry point for you. You can refine the calculation further from here.

create function calculate_PCC(@id int)
returns decimal(16,15)
as
begin
declare @mean numeric(16,5);
declare @stddev numeric(16,5);
declare @count numeric(16,5);
declare @pcc numeric(16,12);
declare @store numeric(16,7);
select @count = CONVERT(numeric(16,5), count(case when Id=@id then 1 end)) from tab1;
select @mean = convert(numeric(16,5),sum([Counts])) / @count
from tab1 WHERE ID = @id;
select @store = (sum(counts * counts) / @count) from tab1 WHERE ID = @id;
set @stddev = sqrt(@store - (@mean * @mean)); 
set @pcc = ((@store - (@mean * @mean)) / (@stddev * @stddev)); 

return @pcc;
end

Call the function like

select db_name.dbo.calculate_PCC(1)
like image 156
Rahul Avatar answered Jun 30 '26 11:06

Rahul



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!