R: plotting posterior classification probabilities of a linear discriminant analysis in ggplot2

Tags:

Using ggord one can make nice linear discriminant analysis ggplot2 biplots (cf chapter 11, Fig 11.5 in "Biplots in practice" by M. Greenacre), as in

library(MASS) install.packages("devtools") library(devtools) install_github("fawda123/ggord") library(ggord) data(iris) ord <- lda(Species ~ ., iris, prior = rep(1, 3)/3) ggord(ord, iris$Species)

enter image description here

I would also like to add the classification regions (shown as solid regions of the same colour as their respective group with say alpha=0.5) or the posterior probabilities of class membership (with alpha then varying according to this posterior probability and the same colour as used for each group) (as can be done in BiplotGUI, but I am looking for a ggplot2 solution). Would anyone know how to do this with ggplot2, perhaps using geom_tile ?

EDIT: below someone asks how to calculate the posterior classification probabilities & predicted classes. This goes like this:

library(MASS) library(ggplot2) library(scales) fit <- lda(Species ~ ., data = iris, prior = rep(1, 3)/3) datPred <- data.frame(Species=predict(fit)$class,predict(fit)$x) #Create decision boundaries fit2 <- lda(Species ~ LD1 + LD2, data=datPred, prior = rep(1, 3)/3) ld1lim <- expand_range(c(min(datPred$LD1),max(datPred$LD1)),mul=0.05) ld2lim <- expand_range(c(min(datPred$LD2),max(datPred$LD2)),mul=0.05) ld1 <- seq(ld1lim[[1]], ld1lim[[2]], length.out=300) ld2 <- seq(ld2lim[[1]], ld1lim[[2]], length.out=300) newdat <- expand.grid(list(LD1=ld1,LD2=ld2)) preds <-predict(fit2,newdata=newdat) predclass <- preds$class postprob <- preds$posterior df <- data.frame(x=newdat$LD1, y=newdat$LD2, class=predclass) df$classnum <- as.numeric(df$class) df <- cbind(df,postprob) head(df)             x        y     class classnum       setosa   versicolor virginica 1 -10.122541 -2.91246 virginica        3 5.417906e-66 1.805470e-10         1 2 -10.052563 -2.91246 virginica        3 1.428691e-65 2.418658e-10         1 3  -9.982585 -2.91246 virginica        3 3.767428e-65 3.240102e-10         1 4  -9.912606 -2.91246 virginica        3 9.934630e-65 4.340531e-10         1 5  -9.842628 -2.91246 virginica        3 2.619741e-64 5.814697e-10         1 6  -9.772650 -2.91246 virginica        3 6.908204e-64 7.789531e-10         1  colorfun <- function(n,l=65,c=100) { hues = seq(15, 375, length=n+1); hcl(h=hues, l=l, c=c)[1:n] } # default ggplot2 colours colors <- colorfun(3) colorslight <- colorfun(3,l=90,c=50) ggplot(datPred, aes(x=LD1, y=LD2) ) +     geom_raster(data=df, aes(x=x, y=y, fill = factor(class)),alpha=0.7,show_guide=FALSE) +     geom_contour(data=df, aes(x=x, y=y, z=classnum), colour="red2", alpha=0.5, breaks=c(1.5,2.5)) +     geom_point(data = datPred, size = 3, aes(pch = Species,  colour=Species)) +     scale_x_continuous(limits = ld1lim, expand=c(0,0)) +     scale_y_continuous(limits = ld2lim, expand=c(0,0)) +     scale_fill_manual(values=colorslight,guide=F)

enter image description here

(well not totally sure this approach for showing classification boundaries using contours/breaks at 1.5 and 2.5 is always correct - it is correct for the boundary between species 1 and 2 and species 2 and 3, but not if the region of species 1 would be next to species 3, as I would get two boundaries there then - maybe I would have to use the approach used here where each boundary between each species pair is considered separately)

This gets me as far as plotting the classification regions. I am looking for a solution though to also plot the actual posterior classification probabilities for each species at each coordinate, using alpha (opaqueness) proportional to the posterior classification probability for each species, and a species-specific colour. In other words, with a stack of three images superimposed. As alpha blending in ggplot2 is known to be order-dependent, I think the colours of this stack would have to calculated beforehand though, and plotted using something like

qplot(x, y, data=mydata, fill=rgb, geom="raster") + scale_fill_identity()

Here is a SAS example of what I am after:

enter image description here

Would anyone know how to do this perhaps? Or does anyone have any thoughts on how to best represent these posterior classification probabilities?

Note that the method should work for any number of groups, not just for this specific example.

596

asked Aug 08 '15 13:08

Tom Wenseleers

Video Answer

1 Answers

I suppose the easiest way will be to show the posterior probabilities. It is pretty straightforward for your case:

datPred$maxProb <- apply(predict(fit)$posterior, 1, max) ggplot(datPred, aes(x=LD1, y=LD2) ) +   geom_raster(data=df, aes(x=x, y=y, fill = factor(class)),alpha=0.7,show_guide=FALSE) +   geom_contour(data=df, aes(x=x, y=y, z=classnum), colour="red2", alpha=0.5, breaks=c(1.5,2.5)) +   geom_point(data = datPred, size = 3, aes(pch = Species,  colour=Species, alpha = maxProb)) +   scale_x_continuous(limits = ld1lim, expand=c(0,0)) +   scale_y_continuous(limits = ld2lim, expand=c(0,0)) +   scale_fill_manual(values=colorslight, guide=F)

enter image description here

You can see the points blend in at blue-green border.

126

answered Sep 29 '22 00:09

tonytonov

Related questions
                            
                                How do I set ulimit for containers in Kubernetes?
                            
                                Are end+1 iterators for std::string allowed?
                            
                                Enum, interfaces and (Java 8) lambdas: code compiles but fails at runtime; is this expected?
                            
                                Selecting one RadioButton value and scrolling back removing the selected one in RecyclerView
                            
                                List file that have changed since last commit with GitPython
                            
                                Year 2038 solution for embedded Linux (32 bit)? [duplicate]
                            
                                Is there an efficient way to share structure between golang packages?
                            
                                Asserting that __init__ was called with right arguments
                            
                                What's difference between tf.sub and just minus operation in tensorflow?
                            
                                RecyclerView and Data Binding not working
                            
                                Collectors.summingInt() vs mapToInt().sum()
                            
                                d3.js : d3.min.js:1 Error: <path> attribute d: Expected number, "MNaN,NaNLNaN,NaN"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R: plotting posterior classification probabilities of a linear discriminant analysis in ggplot2

Tags:

Tom Wenseleers

People also ask

Video Answer

1 Answers

tonytonov

Recent Activity

Donate For Us