I have two variables and I want to find the correlation between them. The issues is that I seem to be getting different results depending on which method I use.
One method I know of is to run a lm() function with the independent and dependent variables in the scale() function.
So with the variables below that would look like:
lm(scale(mainDataframe$relativeFemHappy) ~ scale(mainDataframe$allRights))
Other methods I know of are to simply use the cor() function or the lm.beta() function.
So that would like like:
cor(mainDataframe$relativeFemHappy, mainDataframe$allRights, use="pairwise.complete.obs")
and
library(lm.beta)
lm.beta(lm(mainDataframe$relativeFemHappy ~ mainDataframe$allRights))
The issue is that the results I'm getting are different:
> lm(scale(mainDataframe$relativeFemHappy) ~ scale(mainDataframe$allRights))
Call:
lm(formula = scale(mainDataframe$relativeFemHappy) ~ scale(mainDataframe$allRights))
Coefficients:
(Intercept) scale(mainDataframe$allRights)
-0.002478 -0.272812
> lm.beta(lm(mainDataframe$relativeFemHappy ~ mainDataframe$allRights))
mainDataframe$allRights
-0.2550056
> cor(mainDataframe$relativeFemHappy, mainDataframe$allRights, use="pairwise.complete.obs")
[1] -0.2550056
So with first method using lm() and scale() I'm getting a coefficient of 0.2728, while the lm.beta() and cor() method gives me a coefficient of -0.225
I'm would love to know what's causing this. Thanks.
mainDataframe.allRights mainDataframe.relativeFemHappy
1 1.3333333 0.0653854461
2 NA -0.0943358596
3 1.0000000 -0.3559994842
4 2.0000000 0.0542260426
5 1.3333333 -0.1125838731
6 NA 0.0647522523
7 1.6666667 -0.1119041715
8 1.0000000 0.0564865005
9 1.3333333 0.2199685735
10 1.3333333 0.3016471599
11 0.6666667 0.6291666667
12 NA -0.1322754782
13 NA -0.7031950673
14 1.6666667 0.5382193869
15 0.6666667 0.0515831008
16 1.3333333 -0.2406053407
17 NA -0.3188695664
18 1.3333333 -0.2132530855
19 1.3333333 -0.1051805386
20 1.3333333 0.5137880544
21 1.3333333 -0.1591651057
22 NA 0.3518542315
23 1.6666667 -0.3134255036
24 2.3333333 -0.0353351079
25 1.3333333 -0.3069227981
26 1.3333333 0.4518921825
27 1.3333333 -0.0106520766
28 2.0000000 -0.1744353706
29 1.3333333 -0.5486947791
30 2.0000000 -0.1683776581
31 2.0000000 -0.1141202547
32 2.6666667 0.1352620331
33 2.3333333 NaN
34 1.3333333 -0.4105513765
35 1.3333333 -0.3623256900
36 1.3333333 -0.1843162243
37 2.0000000 -0.2813061511
38 1.3333333 -0.2735289841
39 1.0000000 -0.3703465553
40 1.3333333 -0.0399500250
41 1.3333333 -0.0798679868
42 NA -0.1494736842
43 0.6666667 0.2510419233
44 2.3333333 -0.1636337231
45 3.0000000 -0.2588880820
46 0.3333333 0.5142450779
47 1.6666667 -0.0927171343
48 1.3333333 0.2302559822
49 1.3333333 -0.1605876144
50 1.3333333 0.0224237663
51 1.3333333 -0.3474095401
52 1.3333333 0.0879899428
53 NA -0.2959860780
54 2.0000000 -0.0678765880
55 2.3333333 -0.2593966749
56 2.6666667 -0.3066565041
57 1.6666667 0.0659408848
58 1.6666667 0.3153641680
59 1.3333333 -0.4080779390
60 1.3333333 0.1695402299
61 2.0000000 -0.1246312234
62 1.6666667 -0.4569675001
63 2.0000000 0.1021491160
64 1.3333333 -0.1375955915
65 NA 0.0007769658
66 1.3333333 -0.0427901329
67 2.3333333 0.0918414523
68 1.3333333 0.1675599213
69 1.3333333 0.0667226151
70 1.0000000 0.6140938930
71 1.3333333 0.0139284251
72 2.0000000 -0.0253022876
73 1.3333333 0.0767676768
74 1.3333333 -0.3298592768
75 0.3333333 0.4164929718
76 NA 0.2050189429
77 1.6666667 0.1017706560
78 0.6666667 0.6626247039
79 1.3333333 0.1182371519
80 0.0000000 -0.1336948622
81 0.6666667 0.2007353845
82 2.0000000 -0.0111828561
83 1.3333333 0.0728503690
84 1.3333333 0.3259760711
85 NA 0.1190302497
86 1.0000000 0.1194620625
87 0.6666667 0.0453267607
88 2.0000000 0.0911983186
89 1.3333333 0.1566666667
90 0.0000000 0.0907911338
91 1.6666667 0.0898769242
92 NA -0.1525686518
93 3.0000000 -0.0293211263
94 1.6666667 0.6627064577
95 1.3333333 0.5176272062
96 NA 0.4856334661
97 2.0000000 -0.0205725729
98 1.6666667 -0.2117421455
99 1.3333333 -0.0930969019
100 2.0000000 -0.0367682733
101 1.3333333 0.3817815271
102 NA -0.2265089463
103 NA 0.1038953135
104 NA -0.0329032045
105 1.0000000 -0.0223175342
106 NA 0.0393768703
107 NA -0.1385969952
108 NA 0.1356859273
109 2.0000000 0.0107975036
110 NA 0.0979167949
111 0.6666667 -0.0342344955
112 NA -0.0050468143
113 NA -0.0895239553
114 NA -0.0465631929
115 NA 0.3002016217
116 2.6666667 -0.1137102105
117 0.6666667 0.0882938923
118 NA 0.4241776220
119 NA 0.1236421047
120 NA 0.2142170169
121 NA 0.0387629732
122 1.0000000 -0.0567106487
123 NA 0.0336110922
124 NA 0.1359546531
125 NA -0.0764485186
126 NA 0.3689020044
127 NA 0.4295649361
128 NA -0.1044761961
129 1.0000000 -0.2089427217
130 NA 0.2015707900
131 1.6666667 -0.0740150225
132 NA 0.0851963992
133 NA 0.1023532212
134 1.3333333 -0.0808608360
135 NA 0.2427526973
136 NA -0.0551786818
137 3.0000000 0.0660331924
138 NA -0.3727922200
139 NA 0.1102447610
140 NA -0.2057888977
141 NA -0.1719448695
142 2.3333333 -0.2175613073
143 NA -0.2613899294
144 NA 0.0756224178
145 1.3333333 -0.1586860559
146 NA -0.1028082059
147 1.6666667 -0.0093129029
148 NA 0.2982334465
149 NA -0.2291732892
150 NA -0.3709208321
151 NA 0.0254403690
152 NA -0.2755686789
153 NA 0.1773620638
154 0.6666667 0.1088370006
155 NA 0.0951056627
156 NA -0.3433133733
157 NA -0.0837993745
158 NA -0.3437314283
159 NA -0.2230338635
160 NA 0.0075808250
161 NA 0.0706623401
162 NA 0.0185266374
163 NA 0.0063326421
164 NA 0.0671828617
165 NA -0.1791227448
166 NA -0.0233741378
167 NA -0.0233616222
168 NA 0.5177982205
169 NA -0.0210875370
170 NA -0.0955256618
171 NA 0.2049268262
172 NA -0.0165755643
173 NA 0.3305190592
174 NA 0.1140276893
175 NA -0.1494444444
176 NA 0.0485406351
177 NA 0.1383207807
178 NA -0.0726862507
179 NA 0.0389694042
The absolute values of the standardized regression coefficients may be compared, giving a rough indication of the relative importance of the variables. Each standardized regression coefficient is in units of standard deviations of Y per standard deviation of Xi.
What is a Standardized Beta Coefficient? A standardized beta coefficient compares the strength of the effect of each individual independent variable to the dependent variable. The higher the absolute value of the beta coefficient, the stronger the effect.
Package lm. beta standardizes the coefficients after estimating them using the standard deviations or similar measures of the used variables. So there are unstandardized and standardized coefficients available simultaneously. influences the way of interpretation of the intercept.
This mini-lesson is to introduce the concept of standardized regression coefficients in R. A standardized regression coefficient is simply the β estimate from a regression on standardized variables. A standardized variable is a variable that has a mean of 0 and a standard deviation of 1.
Have a check on this:
## normalization before joint removal of `NA`
attributes(scale(mainDataframe))[3:4]
#$`scaled:center`
# allRights relativeFemHappy
# 1.483660123 0.005227296
#$`scaled:scale`
# allRights relativeFemHappy
# 0.5926344 0.2411674
## normalization after joint removal of `NA`
x <- na.omit(mainDataframe)
attributes(scale(x))[3:4]
#$`scaled:center`
# allRights relativeFemHappy
# 1.47524752 0.00462978
#$`scaled:scale`
# allRights relativeFemHappy
# 0.5894377 0.2580075
As you can see, the mean and standard deviation are different.
Now, if you use lm
for the complete cases x
, you get what you expected:
lm(scale(relativeFemHappy) ~ scale(allRights) - 1, data = x)
#Coefficients:
#scale(allRights)
# -0.255
Note I have used -1
in the formula to drop intercept.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With