Complete.obs of cor() function

Tags:

I am establishing a correlation matrix for my data, which looks like this

df <- structure(list(V1 = c(56, 123, 546, 26, 62, 6, NA, NA, NA, 15
), V2 = c(21, 231, 5, 5, 32, NA, 1, 231, 5, 200), V3 = c(NA, 
NA, 24, 51, 53, 231, NA, 153, 6, 700), V4 = c(2, 10, NA, 20, 
56, 1, 1, 53, 40, 5000)), .Names = c("V1", "V2", "V3", "V4"), row.names = c(NA, 
10L), class = "data.frame")

This gives the following data frame:

        V1  V2  V3   V4
    1   56  21  NA    2
    2  123 231  NA   10
    3  546   5  24   NA
    4   26   5  51   20
    5   62  32  53   56
    6    6  NA 231    1
    7   NA   1  NA    1
    8   NA 231 153   53
    9   NA   5   6   40
    10  15 200 700 5000

I normally use a complete.obs command to establish my correlation matrix using this command

crm <- cor(df, use="complete.obs", method="pearson")

My question here is, how does the complete.obs treat the data? does it omit any row having a "NA" value, make a "NA" free table and make a correlation matrix at once like this?

df2 <- structure(list(V1 = c(26, 62, 15), V2 = c(5, 32, 200), V3 = c(51, 
53, 700), V4 = c(20, 56, 5000)), .Names = c("V1", "V2", "V3", 
"V4"), row.names = c(NA, 3L), class = "data.frame")

or does it omit "NA" values in a pairwise fashion, for example when calculating correlation between V1 and V2, the row that contains an NA value in V3, (such as rows 1 and 2 in my example) do they get omitted too?

If this is the case, I am looking forward to establish a command that reserves as much as possible of the data, by omitting NA values in a pairwise fashion.

Many thanks,

795

asked Sep 19 '13 10:09

Error404

1 Answers

Look at the help file for cor, i.e. ?cor. In particular,

If ‘use’ is ‘"everything"’, ‘NA’s will propagate conceptually, i.e., a resulting value will be ‘NA’ whenever one of its contributing observations is ‘NA’.

If ‘use’ is ‘"all.obs"’, then the presence of missing observations will produce an error. If ‘use’ is ‘"complete.obs"’ then missing values are handled by casewise deletion (and if there are no complete cases, that gives an error).

To get a better feel about what is going on, is to create an (even) simpler example:

df1 = df[1:5,1:3]
cor(df1, use="pairwise.complete.obs", method="pearson") 
cor(df1, use="complete.obs", method="pearson") 
cor(df1[3:5,], method="pearson")

So, when we use complete.obs, we discard the entire row if an NA is present. In my example, this means we discard rows 1 and 2. However, pairwise.complete.obs uses the non-NA values when calculating the correlation between V1 and V2.

113

answered Oct 23 '22 13:10

csgillespie

Related questions
                            
                                How to extract the first line from a text file?
                            
                                Installing "rgl" package in R, Mac OSX El Captian
                            
                                Is it possible to write stdout using write_csv() from readr?
                            
                                How to replace one substring with different substrings in R?
                            
                                How `poly()` generates orthogonal polynomials? How to understand the "coefs" returned?
                            
                                R convert large character string to dataframe
                            
                                How to compute the mean survival time
                            
                                Rename multiple columns given character vectors of column names and replacement [duplicate]
                            
                                R and data.table on AWS
                            
                                Removing holes from polygons in R sf
                            
                                Map dplyr function to each combination of variable pairs in an R dataframe
                            
                                Python in R - Error: could not find a Python environment for /usr/bin/python
                            
                                Calculate Returns over Period of Time
                            
                                Make a table of string frequency
                            
                                R machine learning packages to deal with factors with a large number of levels
                            
                                In R, how to use regex [:punct:] in gsub?
                            
                                How to create a variable of rownames?
                            
                                Downloading Live Olympic Medal Data into R
                            
                                Speedup conversion of 2 million rows of date strings to POSIX.ct
                            
                                Saving a graph with ggsave after using ggplot_build and ggplot_gtable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Complete.obs of cor() function

Tags:

r

na

matrix

correlation

Error404

People also ask

1 Answers

csgillespie

Recent Activity

Donate For Us