Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are quantiles in ggplot stat_quantile?

Here is my reproducible data:

library("ggplot2")
library("ggplot2movies")
library("quantreg")    
set.seed(2154)
msamp <- movies[sample(nrow(movies), 1000), ]

I am trying to become acquainted with stat_quantile but the example from the documentation raises a couple of questions.

mggp <- ggplot(data=msamp, mapping=aes(x=year, y=rating)) + 
    geom_point() + 
    stat_quantile(formula=y~x, quantiles=c(0, 0.25, 0.50, 0.75, 1)) + 
    theme_classic(base_size = 12) + 
    ylim(c(0,10))
mggp
  1. To my understanding quantiles split the data into parts that are smaller than the defined cut-off values, correct? If I define quantiles like in the following code I get five lines. Why? What do they represent?

  2. It seems that the quantiles are calculated based on the dependent variable on the y-axis (rating). Is it possible to reverse this? I mean to split the data based on quantiles in 'year'?

like image 997
vanao veneri Avatar asked Oct 18 '22 19:10

vanao veneri


1 Answers

This function performs quantile regression, and each line is an indicator of the

From Wikipedia:

Quantile regression is a type of regression analysis used in statistics and econometrics. Whereas the method of least squares results in estimates that approximate the conditional mean of the response variable given certain values of the predictor variables, quantile regression aims at estimating either the conditional median or other quantiles of the response variable.

Thus each line in the regression plot is an estimate of the quantile value, e.g. median, 75th and 100th percentile.

You can find a detailed technical discussion in the vignette of the quantreg package.

enter image description here

like image 136
Andrie Avatar answered Oct 21 '22 15:10

Andrie