Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scatter plot against all groups for a long data frame

I am pretty sure something like this is already asked but I don't know how to search for it.

I often get data in a wide format like in my little example with 3 experiments (a-c). I normally convert to long format and convert the values by some function (here log2 as an example).

What I often want to do is to plot all experiments against each other and here I am looking for a handy solution. How can I convert my data frame to get facets for example with a~b, a~c and b~c...

So far I tidy::spread the data again and execute 3 times a ggplot command with the individual column names as x and y. Later I merge the individual graphs together.

Is there a more convenient way?

library(dplyr)
library(tidyr)
library(ggplot2)

df <- data.frame(
  names=letters,
  a=1:26,
  b=1:13,
  c=11:36
)

df %>%
  tidyr::gather(experiment, value, -names) %>%
  mutate(log2.value=log2(value)) 

EDIT
Since I got a very useful answer from @hdkrgr I adapted a bit my code. The inner_join was a great trick which I can implement to automate my idea, what I still miss is a clever filter to get rid of the redundant data, since I don't want to plot c~c or b~a if I already plot a~b. I solved this now by providing the pairings I want to do, but can anyone think ob a straight forward solution? I couldn't think of something which gives me the unique pairing.

my_pairs <- c('a vs. b', 'a vs. c', 'b vs. c')

df %>%
  as_tibble() %>%
  tidyr::gather(experiment, value, -names) %>%
  mutate(log2.value=log2(value))  %>%
  inner_join(., ., by=c("names")) %>%
  mutate(pairing=sprintf('%s vs. %s', experiment.x, experiment.y)) %>%
  filter(pairing %in% my_pairs) %>% 
  ggplot(aes(log2.value.x, log2.value.y)) + 
  geom_point() + 
  facet_wrap( ~ pairing, labeller=label_both)
like image 469
drmariod Avatar asked Oct 17 '25 09:10

drmariod


2 Answers

One way starting from long format would be to do a self-join on the long-data in order to get all combinations of two experiments in each row:

df %>%
    tidyr::gather(experiment, value, -names) %>%
    mutate(log2.value=log2(value)) %>%
    inner_join(., ., by=c("names")) %>% 
    ggplot(aes(log2.value.x, log2.value.y)) + geom_point() + facet_grid(experiment.y ~ experiment.x)

enter image description here

Edit: To avoid plotting redundant experiment-pairs, you can do:

df %>%
    tidyr::gather(experiment, value, -names) %>%
    mutate(log2.value=log2(value)) %>% inner_join(., ., by=c("names")) %>% 
    filter(experiment.x < experiment.y) %>% 
    ggplot(aes(log2.value.x, log2.value.y)) + geom_point() + facet_wrap(~experiment.y + experiment.x)

enter image description here

like image 72
hdkrgr Avatar answered Oct 19 '25 00:10

hdkrgr


This is really interesting because it's actually more complex than it first seems. One thing that sticks out is getting unique pairs of experiments—it seems like you'd want a vs b but not necessarily b vs a as well. To do that, you need the unique set of experiment pairs.

Initially, I tried to work from your gathered data, but realized it might be simpler to start from the wide version. Take the names of the experiments from the column names—you can do this multiple ways, but I just took the strings that aren't "names"—and get the combinations of them. I pasted them together to make them a little easier to work with.

library(dplyr)
library(tidyr)
library(ggplot2)

df <- data.frame(
  names=letters,
  a=1:26,
  b=1:13,
  c=11:36
) %>%
  as_tibble()

exp <- stringr::str_subset(names(df), "names", negate = T)

pairs <- combn(exp, 2, paste, simplify = F, collapse = ",") %>%
  unlist()
pairs
#> [1] "a,b" "a,c" "b,c"

Then, for each pair, extract the associated column names, do a little tidyeval to select those columns, do the log2 transform that you had. I had to detour here to rename the columns with something I could refer back to—I think this isn't necessary, but I couldn't get my tidyeval working inside the ggplot aes. Someone else might have an idea on that. Then make your plot, and label the axes and title accordingly. That leaves you with a list of 3 plots.

plots <- purrr::map(pairs, function(pair) {
  cols <- strsplit(pair, split = ",", fixed = T)[[1]]
  df %>%
    select(names, !!cols[1], !!cols[2]) %>%
    mutate_at(vars(-names), log2) %>%
    rename(exp1 = !!cols[1], exp2 = !!cols[2]) %>%
    ggplot(aes(x = exp1, y = exp2)) +
      geom_point() +
      labs(x = cols[1], y = cols[2], title = pair)
})

Use your method of choice to put the plots together however you want. I went with cowplot, but I also like the patchwork package.

cowplot::plot_grid(plotlist = plots, nrow = 1)

like image 25
camille Avatar answered Oct 18 '25 22:10

camille



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!