Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correlation Scatter-matrix plot with different point size (in R)

I just came a cross this nice code that makes this scatter matrix plot:

alt text
(source: free.fr)

And wanted to implement it to a likret scale variables (integers of 1 to 5) by making the dot's sizes/colors (in the lower triangle) differ according to how many options of that type occurs (like the effect the jitter might have given me).

Any idea on how to do this on the base plotting mechanism ?

Update:

I made the following function, but don't know how to have the scale of the dots always be "good", what do you think ?

panel.smooth2 <- function (x, y, col = par("col"), bg = NA, pch = par("pch"), 
                    cex = 1, col.smooth = "red", span = 2/3, iter = 3, ...) 
{
    require(reshape)
    z <- merge(data.frame(x,y), melt(table(x ,y)),sort =F)$value
    z <- z/ (4*max(z)) 

    symbols( x, y,  circles = z,#rep(0.1, length(x)), #sample(1:2, length(x), replace = T) ,
            inches=F, bg="blue", fg = bg, add = T)

    # points(x, y, pch = pch, col = col, bg = bg, cex = cex)
    ok <- is.finite(x) & is.finite(y)
    if (any(ok)) 
        lines(stats::lowess(x[ok], y[ok], f = span, iter = iter), 
            col = col.smooth, ...)
}



a1 <- sample(1:5, 100, replace = T)
a2 <- sample(1:5, 100, replace = T)
a3 <- sample(1:5, 100, replace = T)
aa <- data.frame(a1,a2,a3)


pairs(aa , lower.panel=panel.smooth2)
like image 358
Tal Galili Avatar asked Apr 07 '10 15:04

Tal Galili


People also ask

How do I change a point on a scatter plot in R?

The scatterplot function in R In order to customize the scatterplot, you can use the col and pch arguments to change the points color and symbol, respectively. You can also pass arguments as list to the regLine and smooth arguments to customize the graphical parameters of the corresponding estimates.

How do you interpret a matrix in a scatter plot in R?

The way to interpret the matrix is as follows: The variable names are shown along the diagonals boxes. All other boxes display a scatterplot of the relationship between each pairwise combination of variables.

Which R function you can use to draw a scatterplot matrix?

cpairs() from package “gclus” gclus is a package for plotting scatterplot matrices and parallel coordinates with specific orders and better display results. The function cpairs() is called enhanced scatterplot matrix.


2 Answers

You can use 'symbols' (analogous to the methods 'lines', 'abline' et al.)

This method will give you fine-grained control over both symbols size and color in a single line of code.

Using 'symbols' you can set the symbol size, color, and shape. Shape and size are set by passing in a vector for the size of each symbol and binding it to either 'circles', 'squares', 'rectangles', or 'stars', e.g., 'stars' = c(4, 3, 5, 1). Color is set with 'bg' and/or 'fg'.

symbols( x, y, circles = circle_radii, inches=1/3, bg="blue", fg=NULL) 

If i understand the second part of your question, you want to be reasonably sure that the function you use to scale the symbols in your plot does so in a meaningful way. The 'symbols' function scales (for instance) the radii of circles based on values in a 'z' variable (or data.frame column, etc.) In the line below, I set the max symbol size (radius) as 1/3 inches--every symbol except for the largest has a radius some fraction smaller, scaled by the ratio of the value of that dat point over the largest value. than that one in proportion to Is this a good choice? I don't know--it seems to me that diameter or particularly circumference might be better. In any event, that's a trivial change. In sum, 'symbols' with 'circles' passed in will scale the radii of the symbols in proportion to the 'z' coordinate--probably best suited for continuous variables. I would use color ('bg') for discrete variables/factors.

One way to use 'symbols' is to call your plot function and pass in type='n' which creates the plot object but suppresses drawing the symbols so that you can draw them with the 'symbols' function next.

I would not recommend 'cex' for this purpose. 'cex' is a scaling factor for both text size and symbols size, but which of those two plot elements it affects depends on when you pass it in--if you set it via 'par' then it acts on most of the text appearing on the plot; if you set it within the 'plot' function then it affects symbols size.

like image 154
doug Avatar answered Sep 23 '22 02:09

doug


Sure, just use cex:

set.seed(42)
DF <- data.frame(x=1:10, y=rnorm(10)*10, z=runif(10)*3) 
with(DF, plot(x, y, cex=z))

which gives you varying circle sizes. Color can simply be a fourth dimension.

like image 34
Dirk Eddelbuettel Avatar answered Sep 22 '22 02:09

Dirk Eddelbuettel