Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to identify the distribution of the given data using r

Tags:

r

I have the data as below and i need to identify the distribution of the data. pls help.

 x <-  c(37.50,46.79,48.30,46.04,43.40,39.25,38.49,49.51,40.38,36.98,40.00,38.49,37.74,47.92,44.53,44.91,44.91,40.00,41.51,47.92,36.98,43.40)
like image 805
Vanathaiyan S Avatar asked Jul 31 '15 08:07

Vanathaiyan S


People also ask

How do I know what distribution My data is?

Using Probability Plots to Identify the Distribution of Your Data. Probability plots might be the best way to determine whether your data follow a particular distribution. If your data follow the straight line on the graph, the distribution fits your data. This process is simple to do visually.


2 Answers

A neat approach would involve using fitdistrplus package that provides tools for distribution fitting. On example of your data.

library(fitdistrplus)
descdist(x, discrete = FALSE)

enter image description here

Now you can attempt to fit different distributions. For example:

normal_dist <- fitdist(x, "norm")

abs subsequently inspect the fit:

plot(normal_dist)

Fitting


As a generic point I would suggest that you have a look at this discussion at Cross Validated, where the subject is discussed at lengths. You may be also willing to have a look at a paper by Delignette-Muller and Dutang - fitdistrplus: An R Package for Fitting Distributions, available here if you are interested in a more detailed explanation on how to use the Cullen and Frey graph.

like image 64
Konrad Avatar answered Nov 16 '22 03:11

Konrad


First, thing you can do is to plot the histogram and overlay the density

hist(x, freq = FALSE)
lines(density(x))

Then, you see that the distribution is bi-modal and it could be mixture of two distribution or any other.

Once you identified a candidate distribution a 'qqplot' can help you to visually compare the quantiles.

like image 33
thothal Avatar answered Nov 16 '22 02:11

thothal