cor() behavior in R different between individual vectors and data.frame

Tags:

i'm trying to get the Pearson correlation coefficient for all rows in a data frame relative to each other. there are values that are empty (NA) and this seems to be presenting a problem that I don't encounter when running cor() on 2 vectors with missing values. This is the correct result on 2 vectors:

x <- c(NA, 4.5, NA, 4, NA, 1)
y <- c(2.5, 3.5, 3, 3.5, 3, 2.5)
cor(x,y, use = "complete.obs")
[1] 0.9912407

and here is the result when they are part of a data frame:

cor(t(critics1), use = "complete.obs")
   y  a  b  c  d  e  x
y  1 NA NA NA NA NA NA
a NA  1  1  1 -1  1 -1
b NA  1  1  1 -1  1 -1
c NA  1  1  1 -1  1 -1
d NA -1 -1 -1  1 -1  1
e NA  1  1  1 -1  1 -1
x NA -1 -1 -1  1 -1  1
Warning message:
In cor(t(critics1), use = "complete.obs") : the standard deviation is zero

Why is the use parameter not having the same effect? Here is what the critics1 dataframe looks like;

film1 film2 film3 film4 film5 film6
y   2.5   3.5   3.0   3.5   3.0   2.5
a   3.0   3.5   1.5   5.0   3.0   3.5
b   2.5   3.0    NA   3.5   4.0    NA
c    NA   3.5   3.0   4.0   4.5   2.5
d   3.0   4.0   2.0   3.0   3.0   2.0
e   3.0   4.0    NA   5.0   3.0   3.5
x    NA   4.5    NA   4.0    NA   1.0

834

asked Dec 06 '11 18:12

hawkhandler

1 Answers

As @joran speculated, when you transpose critics1, there are only two complete observations (i.e. rows with no missing values). That's why all of the correlations are either 1 or -1 or (for those involving y, which has value 3.5 in both complete rows), NA.

t(critics1)
#         y   a   b   c d   e   x
# film1 2.5 3.0 2.5  NA 3 3.0  NA
# film2 3.5 3.5 3.0 3.5 4 4.0 4.5
# film3 3.0 1.5  NA 3.0 2  NA  NA
# film4 3.5 5.0 3.5 4.0 3 5.0 4.0
# film5 3.0 3.0 4.0 4.5 3 3.0  NA
# film6 2.5 3.5  NA 2.5 2 3.5 1.0

If you use use="pairwise.complete.obs" instead of use="complete.obs", it works as you'd like:

cor(t(df), use="pairwise.complete.obs")["y","x"] # Extract correlation of y and x
# [1] 0.9912407

answered Sep 28 '22 06:09

Josh O'Brien

Related questions
                            
                                create hash value for each row of data in dataframe in R
                            
                                R Script to average value over every <x> days
                            
                                How to parse (in R) this API call into a .txt table format? (related to "open government" of Israel :) )
                            
                                r: do not show warnings
                            
                                missing values - Hot Deck neighbour method
                            
                                Average of n rows
                            
                                subset data.frame for ggplot2 bar chart
                            
                                How can I make a multiple bar chart in ggplot2?
                            
                                How can I inner join two csv files in R?
                            
                                Create a vector listing run length of original vector with same length as original vector
                            
                                IBrokers request Historical Futures Contract Data?
                            
                                While loop in R quantstrat code - how to make it faster?
                            
                                Boot package in R simple assistance
                            
                                Boxplot outlier labeling in R
                            
                                R: Help reading a particular .mat file into R
                            
                                Obtain values through key vector (R)
                            
                                losing dataframe when using do.call
                            
                                How do I run a multiple linear regression using a vector as my predictors?
                            
                                R List of numeric vectors -> C++ 2d array with Rcpp
                            
                                How to append new value to xts object without creating a new one

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

cor() behavior in R different between individual vectors and data.frame

Tags:

dataframe

r

correlation

pearson

hawkhandler

People also ask

1 Answers

Josh O'Brien

Recent Activity

Donate For Us