I am currently analysing the Auto
data from the ISLR package. I want to produce a parallel coordinate plot of the variables mpg
, cylinders
, displacement
, horsepower
, weight
, acceleration
, and year
. My plot is as follows:
library(GGally)
parcoord = ggparcoord(Auto.df, columns = 1:7, mapping = aes(color = as.factor(origin)), title = "Complete Auto Data") + scale_color_discrete("origin", labels = levels(Auto.df$origin))
print(parcoord)
Notice that I have stated columns = 1:7
. It just so happens that the variables I want are in consecutive columns in the Auto
dataset. But what if they weren't, and I wanted to discretely select the variables/columns?
Furthermore, notice that I have set the variable origin
to be a factor, and then placed it as a legend on the side. As you can see, the three values of origin
are in different colours. However, the actual value of origin
(1
, 2
, 3
) is not displayed next to the colour, so we can't tell which colour is associated to which value. How do I set it so that this legend also displays the actual value?
For selecting the columns, you must pass a a vector of column indices. To display values in the legend, just remove labels = levels(Auto.df$origin)
from the scale_color_discrete
.
Here is the new code:
data(Auto)
parcoord <- ggparcoord(Auto, columns = c(1,5,7),
mapping = aes(color = as.factor(origin)),
title = "Complete Auto Data") +
scale_color_discrete("origin")
print(parcoord)
At the beginning, I suggest that you convert the variable origin
to factor
even before using the data to prepare the plot. So do like this:
library(ISLR)
library(tidyverse)
library(GGally)
data(Auto)
Auto.df = Auto %>% as_tibble() %>%
mutate(origin = origin %>% paste %>% fct_inorder)
Now you can prepare the chart like this:
Auto.df %>%
ggparcoord(columns = 1:7,
groupColumn="origin",
mapping = aes(color = origin),
title = "Complete Auto Data")
When you want to analyze only selected columns (e.g. 2, 5 and 7) do it like this:
Auto.df %>%
ggparcoord(columns = c(2,5,7),
groupColumn="origin",
mapping = aes(color = origin),
title = "Complete Auto Data")
The last way to select variables and their order, perhaps more readable, at least for me, might be:
Auto.df %>% select(displacement, mpg, weight, origin) %>%
ggparcoord(columns = 1:3,
groupColumn="origin",
mapping = aes(color = origin),
title = "Complete Auto Data")
This solution greatly simplifies what you want to do and does not require the use of the scale_color_discrete
function. I hope this is the effect you wanted. That if it does not fully suit your needs, please write a comment.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With