Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using ggplot2 and facet_grid for continuous and categorical variables together (R)

I am trying to make a series of graphs like this:

enter image description here

I have some mixed categorical and continuous data. I am able to make this series of graphs when there are only categorical variables or when there are only continuous variables. But I am unable to produce this series of graphs when there are both types of variables.

I have created some data below. Is there a way to debug this code so that it produces a series of graphs?

library(ggplot2) 
library(gridExtra)
library(tidyr)

/create some data/

var_1 <- rnorm(100,1,4)
var_2 <- sample( LETTERS[1:2], 100, replace=TRUE, prob=c(0.3, 0.7) )
var_3 <- sample( LETTERS[1:5], 100, replace=TRUE, prob=c(0.2, 0.2,0.2,0.2, 0.1) )
cluster <- sample( LETTERS[1:4], 100, replace=TRUE, prob=c(2.5, 2.5, 2.5, 2.5) )

/put in a frame/

f <- data.frame(var_1, var_2, var_3, cluster)

/convert to factors/

f$var_2 = as.factor(f$var_2)
f$var_3 = as.factor(f$var_3)
f$cluster = as.factor(f$cluster)

/create graphs/

f2 %>% pivot_longer(cols = contains("var"), names_to = "variable") %>% 
    ggplot(aes(x = value, fill = value)) + 
    geom_bar() + geom_density() +
    facet_grid(rows = vars(cluster), 
               cols = vars(variable), 
               scales = "free") + 
    labs(y = "freq", fill = "Var")

When I only have categorical variables, the following code works:

var_2 <- sample( LETTERS[1:2], 100, replace=TRUE, prob=c(0.3, 0.7) )

var_3 <- sample( LETTERS[1:5], 100, replace=TRUE, prob=c(0.2, 0.2,0.2,0.2, 0.1) )

cluster <- sample( LETTERS[1:4], 100, replace=TRUE, prob=c(2.5, 2.5, 2.5, 2.5) )

f <- data.frame(var_2, var_3, cluster)
f$var_2 = as.factor(f$var_2)
f$var_3 = as.factor(f$var_3)
f$cluster = as.factor(f$cluster)

f%>% pivot_longer(cols = contains("var"), names_to = "variable") %>% ggplot(aes(x = value, fill = value)) + geom_bar() + geom_density() +facet_grid(rows = vars(cluster), cols = vars(variable), scales = "free") + labs(y = "freq", fill = "Var")
like image 406
stats_noob Avatar asked Sep 12 '25 04:09

stats_noob


1 Answers

I do not think ggplot can handle both continuous and categorical variables in the y or x aesthetic. But there is also an error when mixing them in the pivot_longer().

Error: Can't combine `var_1` <double> and `var_2` <character>.

My recommendation would be to create separate plots for each metric and then combine the plots. This will give you greater control of each plot. Here is an example using GGally's ggmatrix(). I am sure this is also possible with gridextra.

library(ggplot2)
library(gridExtra)
library(tidyr)
library(GGally)

# Generate data
var_1 <- rnorm(100, 1, 4)
var_2 <- sample(LETTERS[1:2], 100, replace = TRUE, prob = c(0.3, 0.7))
var_3 <- sample(LETTERS[1:5], 100, replace = TRUE, prob = c(0.2, 0.2, 0.2, 0.2, 0.1))
cluster <- sample(LETTERS[1:4], 100, replace = TRUE,prob = c(2.5, 2.5, 2.5, 2.5))

f <- data.frame(var_1, var_2, var_3, cluster)

f$var_2 = as.factor(f$var_2)
f$var_3 = as.factor(f$var_3)
f$cluster = as.factor(f$cluster)

# Create plots for each var
var_1_plot <- f %>%
  ggplot(aes(x = var_1,
             fill = cluster)) +
  geom_density() +
  facet_grid(cluster ~ .,
             scales = "free")
var_2_plot <- f %>%
  ggplot(aes(x = var_2,
             fill = cluster)) +
  geom_bar() +
  facet_grid(cluster ~ .,
             scales = "free")

var_3_plot <- f %>%
  ggplot(aes(x = var_3,
             fill = cluster)) +
  geom_bar() +
  facet_grid(cluster ~ .,
             scales = "free")

# Combine all plots
plot_list <- list(var_1_plot, var_2_plot, var_3_plot)
GGally::ggmatrix(
  plots = plot_list,
  nrow = 1,
  ncol = 3,
  xAxisLabels = c("Var 1", "Var 2", "Var 3"),
)

enter image description here

like image 192
David Gibson Avatar answered Sep 14 '25 17:09

David Gibson