I have a huge data frame and I would like to make some plots to get an idea of the associations among different variables. I cannot use
pairs(data)
, because that would give me 400+ plots. However, there's one response variable y I'm particularly interested in. Thus, I'd like to plot y against all variables, which would reduce the number of plots from n^2 to n. How can I do it?
EDIT: I add an example for the sake of clarity. Let's say I have the dataframe
foo=data.frame(x1=1:10,x2=seq(0.1,1,0.1),x3=-7:2,x4=runif(10,0,1))
and my response variable is x3. Then I'd like to generate four plots arranged in a row, respectively x1 vs x3, x2 vs x3, an histogram of x3 and finally x4 vs x3. I know how to make each plot
plot(foo$x1,foo$x3)
plot(foo$x2,foo$x3)
hist(foo$x3)
plot(foo$x4,foo$x3)
However I have no idea how to arrange them in a row. Also, it would be great if there was a way to automatically make all the n plots, without having to call the command plot (or hist) each time. When n=4, it's not that big of an issue, but I usually deal with n=20+ variables, so it can be a drag.
The most used graph for visualizing the relationship between two numeric variables is the scatter plot. But there is one alternative that can be useful and is increasingly popular: the slope chart or slope graph.
To plot a scatterplot of one numeric variable against another numeric variable we just need to include both variables as arguments when using the plot() function.
A scatterplot is a type of data display that shows the relationship between two numerical variables.
To graph numerical data, one uses dot plots, stem and leaf graphs, histograms, box plots, ogive graphs, and scatter plots. Dot plot: Stem and leaf: For these graphs, the stem represents the first digit of the number and the leaf/leaves represent the second digit(s).
The package tidyr helps doing this efficiently. please refer here for more options
data %>%
gather(-y_value, key = "some_var_name", value = "some_value_name") %>%
ggplot(aes(x = some_value_name, y = y_value)) +
geom_point() +
facet_wrap(~ some_var_name, scales = "free")
you would get something like this
If your goal is only to get an idea of the associations among different variables, you can also use:
plot(y~., data = foo)
It is not as nice as using ggplot
and it doesn't automatically put all the graphs in one window (although you can change that using par(mfrow = c(a, b))
, but it is a quick way to get what you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With