Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting a kde result in ggtern

I'm using ggtern to plot a large dataset in a form of tertiary plot (see below an example).

enter image description here

Till a certain data-size everything was perfect, as I was using geom_density_tern(). As I want to visualize a far more complicated dataset loading all of it and rendering with ggplot becomes impossible (limitation on the memory side). I thought that maybe there could be a workaround by imputing the result of kde2d matrix calculated seperately. And that's where I'm stuck. I would like to know if it is possible to do it anyhow in ggtern?

In any case I add a minimal case of the data structure and plotting that I use at this moment.

require(ggplot2)
require(ggtern) 

set.seed(1) 

mydata <- data.frame(
        x = runif(100, min = 0.25, max = 0.5),
        y = runif(100, min = 0.1, max = 0.4),
        z = runif(100, min = 0.5, max = 0.7))   

plot <- ggtern() + 
        theme_bw() +
        theme_hidetitles() +
        geom_density_tern(data = mydata,
            aes(x = x, y = y, z = z, alpha = ..level.. ), 
            size = 0.1, linetype = "solid", fill = "blue")+
        geom_point(data = mydata, 
            aes(x = x, y = y, z = z), alpha = 0.8, size = 1)
plot

Those extra lines reproduce the density plot in the ternary coordination system:

library(MASS)
dataTern = transform_tern_to_cart(mydata$x,mydata$y,mydata$z)
dataTernDensity <- kde2d(x=dataTern$x, y=dataTern$y, lims = c(range(0,1), range(0,1)), n = 400) 

image(dataTernDensity$x, dataTernDensity$y, dataTernDensity$z)
points(dataTern$x, dataTern$y, pch = 20, cex = 0.1)
segments(x0 = 0, y0 = 0, x1 = 0.5, y1 = 1, col= "white")
segments(x0 = 0, y0 = 0, x1 = 1, y1 = 0, col= "white")
segments(x0 = 0.5, y0 = 1, x1 = 1, y1 = 0, col= "white")

And obtaining this graph:

enter image description here

Thanks in advance for any help!

like image 625
Ludwik Avatar asked Jan 15 '16 12:01

Ludwik


People also ask

What is kdeplot and kernel density estimate?

Kernel Density Estimate (KDE) Plot and Kdeplot allows us to estimate the probability density function of the continuous or non-parametric from our data set curve in one or more dimensions it means we can create plot a single graph for multiple samples which helps in more efficient data visualization.

What is a KDE plot?

KDE stands for Kernel Density Estimate, which is a graphical way to visualise our data as the Probability Density of a continuous variable. It is an effort to analyse the model data to understand how the variables are distributed. Creating a KDE plot can answer many questions such as, What range is covered by the observer?

How to highlight the plot of a kdeplot in Seaborn?

We can highlight the plot using shade to the area covered by the curve. If True, shadow processing is performed in the area below the kde curve, and color controls the color of the curve and shadow Simple pass the two variables into the seaborn.kdeplot () methods.

How to obtain a bivariate kdeplot?

To obtain a bivariate kdeplot we first obtain the query that will select the target value of Iris_Virginica, this query selects all the rows from the table of data with the target value of Iris_Virginica.


1 Answers

We can solve this using the code which is usually used behind the scenes in the Stat. Having just released ggtern 2.0.1, published on CRAN a couple of days ago after completely re-writing the package to be compatible with ggplot2 2.0.0, I am familiar with an approach that may suit your needs. Incidentally, for you interest, a summary of the new functionality in ggtern 2.0.X can be found here:

Below please find a solution and working code for your problem, which is a density estimate calculated on isometric log-ratio space.

solution

#Required Libraries
library(ggtern)
library(ggplot2)
library(compositions)
library(MASS)
library(scales)

set.seed(1) #For Reproduceability
mydata <- data.frame(
  x = runif(100, min = 0.25, max = 0.5),
  y = runif(100, min = 0.1, max = 0.4),
  z = runif(100, min = 0.5, max = 0.7)) 

#VARIABLES
nlevels  = 7
npoints  = 200
expand   = 0.5

#Prepare the data, put on isometric logratio basis
df     = data.frame(acomp(mydata)); colnames(df) = colnames(mydata)
data   = data.frame(ilr(df)); colnames(data) = c('x','y')

#Prepare the Density Estimate Data
h.est  = c(MASS::bandwidth.nrd(data$x), MASS::bandwidth.nrd(data$y))
lims   = c(expand_range(range(data$x),expand),expand_range(range(data$y),expand))
dens   = MASS::kde2d(data$x,data$y,h=h.est,n=npoints,lims=lims)

#-------------------------------------------------------------
#<<<<< Presumably OP has data at this point, 
#      and so the following should achieve solution
#-------------------------------------------------------------

#Generate the contours via ggplot2's non-exported function
lines  = ggplot2:::contour_lines(data.frame(expand.grid(x = dens$x, y = dens$y),
                                            z=as.vector(dens$z),group=1),
                                 breaks=pretty(dens$z,n=nlevels))

#Transform back to ternary space
lines[,names(mydata)] = data.frame(ilrInv(lines[,names(data)]))

#Render the plot
ggtern(data=lines,aes(x,y,z)) +
  theme_dark() + 
  theme_legend_position('topleft') + 
  geom_polygon(aes(group=group,fill=level),colour='grey50') +
  scale_fill_gradient(low='green',high='red') + 
  labs(fill  = "Density",
       title = "Example Manual Contours from Density Estimate Data")
like image 191
Nicholas Hamilton Avatar answered Nov 02 '22 07:11

Nicholas Hamilton