Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting values from kernel density estimation in R

I am trying to get density estimates for the log of stock prices in R. I know I can plot it using plot(density(x)). However, I actually want values for the function.

I'm trying to implement the kernel density estimation formula. Here's what I have so far:

a <- read.csv("boi_new.csv", header=FALSE)
S = a[,3] # takes column of increments in stock prices
dS=S[!is.na(S)] # omits first empty field

N = length(dS)                  # Sample size
rseed = 0                       # Random seed
x = rep(c(1:5),N/5)             # Inputted data

set.seed(rseed)   # Sets random seed for reproducibility

QL <- function(dS){
    h = density(dS)$bandwidth
    r = log(dS^2)
    f = 0*x
    for(i in 1:N){
        f[i] = 1/(N*h) * sum(dnorm((x-r[i])/h))
    }
    return(f)
}

QL(dS)

Any help would be much appreciated. Been at this for days!

like image 871
Ruth O'Brien Avatar asked Jan 27 '13 20:01

Ruth O'Brien


People also ask

How do you use kernel density estimation in R?

The density() function in R computes the values of the kernel density estimate. Applying the plot() function to an object created by density() will plot the estimate. Applying the summary() function to the object will reveal useful statistics about the estimate.

How do you evaluate kernel density?

The KDE is calculated by weighting the distances of all the data points we've seen for each location on the blue line. If we've seen more points nearby, the estimate is higher, indicating that probability of seeing a point at that location.

What does a kernel density plot tell you?

Description. As known as Kernel Density Plots, Density Trace Graph. A Density Plot visualises the distribution of data over a continuous interval or time period. This chart is a variation of a Histogram that uses kernel smoothing to plot values, allowing for smoother distributions by smoothing out the noise.

What is the drawback of using kernel density estimation?

The drawback of using kernal density estimation histogram method are- it results in discontinuous shape of the histogram. The data representation is poor. The data is represented vaguely and causes disruptions.


1 Answers

You can pull the values directly from the density function:

x = rnorm(100)
d = density(x, from=-5, to = 5, n = 1000)
d$x
d$y

Alternatively, if you really want to write your own kernel density function, here's some code to get you started:

  1. Set the points z and x range:

    z = c(-2, -1, 2)
    x = seq(-5, 5, 0.01)
    
  2. Now we'll add the points to a graph

    plot(0, 0, xlim=c(-5, 5), ylim=c(-0.02, 0.8), 
         pch=NA, ylab="", xlab="z")
    for(i in 1:length(z)) {
       points(z[i], 0, pch="X", col=2)
    }
     abline(h=0)
    
  3. Put Normal density's around each point:

    ## Now we combine the kernels,
    x_total = numeric(length(x))
    for(i in 1:length(x_total)) {
      for(j in 1:length(z)) {
        x_total[i] = x_total[i] + 
          dnorm(x[i], z[j], sd=1)
      }
    }
    

    and add the curves to the plot:

    lines(x, x_total, col=4, lty=2)
    
  4. Finally, calculate the complete estimate:

    ## Just as a histogram is the sum of the boxes, 
    ## the kernel density estimate is just the sum of the bumps. 
    ## All that's left to do, is ensure that the estimate has the
    ## correct area, i.e. in this case we divide by $n=3$:
    
    plot(x, x_total/3, 
           xlim=c(-5, 5), ylim=c(-0.02, 0.8), 
           ylab="", xlab="z", type="l")
    abline(h=0)
    

    This corresponds to

    density(z, adjust=1, bw=1)
    

The plots above give:

enter image description here

like image 131
csgillespie Avatar answered Sep 19 '22 09:09

csgillespie