Correlationmatrix from data table

Tags:

If I have the following data table:

set.seed(1)
TDT <- data.table(Group = c(rep("A",40),rep("B",60)),
                      Id = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)),
                      Time = rep(seq(as.Date("2010-01-03"), length=20, by="1 month") - 1,5),
                      norm = round(runif(100)/10,2),
                      x1 = sample(100,100),
                      x2 = round(rnorm(100,0.75,0.3),2),
                      x3 = round(rnorm(100,0.75,0.3),2),
                      x4 = round(rnorm(100,0.75,0.3),2),
                      x5 = round(rnorm(100,0.75,0.3),2))

How can I calculate the correlations between x1, x2, x3, x4 and x5 by Time?

This:

TDT[,x:= list(cor(TDT[,5:9])), by = Time]

does not work.

How can it be done in datatable?

961

asked Mar 05 '17 17:03

user3032689

1 Answers

You were so close in your attempt! All you missed was an extra list().

This works:

TDT[,x:= list(list(cor(TDT[,5:9]))), by = Time]

And TDT$x returns:

[[1]]
            x1          x2          x3         x4          x5
x1  1.00000000  0.72185099  0.07368766 -0.7031890 -0.36895449
x2  0.72185099  1.00000000  0.68058833 -0.7393130  0.05066973
x3  0.07368766  0.68058833  1.00000000 -0.5021462  0.10645894
x4 -0.70318896 -0.73931299 -0.50214616  1.0000000  0.11671020
x5 -0.36895449  0.05066973  0.10645894  0.1167102  1.00000000

[[2]]
           x1         x2          x3          x4         x5
x1  1.0000000 -0.1011948 -0.85191422 -0.15571603  0.4855237
x2 -0.1011948  1.0000000  0.56691559 -0.44002621 -0.6699172
x3 -0.8519142  0.5669156  1.00000000  0.02189754 -0.6168013
x4 -0.1557160 -0.4400262  0.02189754  1.00000000  0.2236542
x5  0.4855237 -0.6699172 -0.61680132  0.22365419  1.0000000

[...]

The extra list() is needed because of how data.table parses the second element of the DT[1,2] syntax. This has been discussed in depth elsewhere in stackoverflow, with a most excellent answer that I invite you to read.

As a side note, it seems preferable to replace the outermost call to list() with .() to clarify the intent. I also like to single out explicitly the columns with a reference to .SD and .SDcols. With the same outcome, you could rewrite your code as:

TDT[, x := .(list(cor(.SD))), by = Time, .SDcols = 5:9]

139

answered Oct 01 '22 03:10

Jealie

Related questions
                            
                                R- Find Unique Permutations of Values
                            
                                Making an Image Hyperlink in R Shiny header
                            
                                How do I build a Multiple Criteria Index in R, incorporating > and < operators?
                            
                                Efficient way to find manager's manager's id
                            
                                Drawing ggplot Footer Using linesGrob within grobTree
                            
                                Image popup on hover in DT in R
                            
                                R: ggplot background gradient coloring
                            
                                Creating a data frame with the contents of multiple txt files
                            
                                R - allocate a share of a number over different columns using an ifelse statement
                            
                                Why does this image made by R have a spurious vertical white line in it?
                            
                                Why does my NLOPT optimization error/fail to solve?
                            
                                ggplot2: Deleting facets of unused factor level combinations from a plot (facet_grid)
                            
                                Bookdown: Single html output file
                            
                                How to cast data from long to wide format in H2O?
                            
                                Match by id and divide column values across two dataframes
                            
                                R Change IP Address programmatically
                            
                                Join gap in polar line ggplot plot
                            
                                Merging 2 vectors and removing all repetitions
                            
                                double nesting with tidyverse and purrr
                            
                                How can I replace vector values in a sequence at regular intervals in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Correlationmatrix from data table

Tags:

r

data.table

correlation

user3032689

People also ask

1 Answers

Jealie

Recent Activity

Donate For Us