Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert source target value dataframe into a correlation matrix

I have a data.frame containing different combinations of a group and count values where both of them exist. I need to plot or create a matrix similar to correlation matrix. I have come up with a simple example here

dat <- data.frame(source = c('A','A','A','B','B','C'),
              target = c('B','C','D','C','D','D'),
              count = c(4,5,6,3,3,5))

> dat
  source target count
1      A      B     4
2      A      C     5
3      A      D     6
4      B      C     3
5      B      D     3
6      C      D     5

How do I get a matrix like this?? and plot this matrix enter image description here

like image 506
Koundy Avatar asked Apr 28 '26 05:04

Koundy


2 Answers

This uses dplyr, tapply and pheatmap:

library(dplyr)

by <- c("source", "target")
m <- dat %>%
  mutate(across(any_of(by), \(x) factor(x, sort(unique(unlist(.[by])))))) %>%
  { tapply(.$count, .[by], c, default = 0) } %>%
  { . + t(.) + diag(ncol(.)) }
m

giving

      target
source A B C D
     A 1 4 5 6
     B 4 1 3 3
     C 5 3 1 5
     D 6 3 5 1


library(pheatmap)
pheatmap(m, display_numbers = TRUE, cluster_rows = FALSE, cluster_cols = FALSE)

(continued after image) screenshot

We can also try the Bioconductor ComplexHeatmap package.

library(circlize)
library(ComplexHeatmap)
library(RColorBrewer)

col_fun <-  colorRamp2(c(0, 3, 6), brewer.pal(n = 3, name = "RdYlBu"))
Heatmap(m, name = "m", col = col_fun,
  cluster_rows = FALSE,
  cluster_columns = FALSE,
  column_names_rot = 0,
  row_names_gp = gpar(fontsize = 20),
  column_names_gp = gpar(fontsize = 20),
  cell_fun = function(j, i, x, y, width, height, fill) {
    grid.text(sprintf("%.0f", m[i, j]), x, y, gp = gpar(fontsize = 30))})

(continued after image) screenshot

or without color

Heatmap(m, name = "m", 
  rect_gp = gpar(type = "none"),
  cluster_rows = FALSE,
  cluster_columns = FALSE,
  column_names_rot = 0,
  row_names_gp = gpar(fontsize = 20),
  column_names_gp = gpar(fontsize = 20),
  show_heatmap_legend = FALSE,
  cell_fun = function(j, i, x, y, width, height, fill) {
    grid.rect(x = x, y = y, width = width, height = height, gp = gpar(fill = NA))
    grid.text(sprintf("%.0f", m[i, j]), x, y, gp = gpar(fontsize = 30))})

(continued after image) screenshot

Another possibility is a balloonplot using the gplots package.

library(gplots)
balloonplot(as.table(t(m)), show.margins = FALSE, cum.margins = FALSE, main = "m")

(continued after image) screenshot

The ggpubr package also has a balloon plot function:

library(ggpubr)

ggballoonplot(m, show.label = TRUE, rotate.x.text = 0)

screenshot

like image 106
G. Grothendieck Avatar answered Apr 30 '26 20:04

G. Grothendieck


You can try the following options

  • Option 1: xtabs with base R
> `diag<-`(xtabs(count ~ ., rbind(dat, setNames(dat, names(dat)[c(2, 1, 3)]))), 1)
      target
source A B C D
     A 1 4 5 6
     B 4 1 3 3
     C 5 3 1 5
     D 6 3 5 1
  • Option 2: as_adjacency_matrix with igraph package
library(igraph)
dat %>%
    graph_from_data_frame(directed = FALSE) %>%
    as_adjacency_matrix(attr = "count", sparse = FALSE) %>%
    `diag<-`(1)

which gives

  A B C D
A 1 4 5 6
B 4 1 3 3
C 5 3 1 5
D 6 3 5 1
like image 40
ThomasIsCoding Avatar answered Apr 30 '26 20:04

ThomasIsCoding



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!