Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating the Mode for Nominal as well as Continuous variables in [R]

Tags:

r

Can anyone help me with this?

If I run:

> mode(iris$Species)
[1] "numeric"
> mode(iris$Sepal.Width)
[1] "numeric"

Then I get "numeric" as answer

Cheers

M

like image 236
Markus Avatar asked Nov 02 '25 16:11

Markus


1 Answers

The function mode() is used to find out the storage mode of the the object, in this case is is stored as mode "numeric". This function is not used to find the most "frequent" observed value in a data set, i.e. it is not used to find the statistical mode. See ?mode for more on what this function does in R and why it isn't useful for your problem.

For discrete data, the mode is the most frequent observed value among the set:

> set.seed(1) ## reproducible example
> dat <- sample(1:5, 100, replace = TRUE) ## dummy data
> (tab <- table(dat)) ## tabulate the frequencies
dat
 1  2  3  4  5 
13 25 19 26 17 
> which.max(tab) ## which is the mode?
4 
4 
> tab[which.max(tab)] ## what is the frequency of the mode?
 4 
26

For continuous data, the mode is the value of the data at which the probability density function (PDF) reaches a maximum. As your data are generally a sample from some continuous probability distribution, we don't know the PDF but we can estimate it through a histogram or better through a kernel density estimate.

Returning to the iris data, here is an example of determining the mode from continuous data:

> sepalwd <- with(iris, density(Sepal.Width)) ## kernel density estimate
> plot(sepalwd)
> str(sepalwd)
List of 7
 $ x        : num [1:512] 1.63 1.64 1.64 1.65 1.65 ...
 $ y        : num [1:512] 0.000244 0.000283 0.000329 0.000379 0.000436 ...
 $ bw       : num 0.123
 $ n        : int 150
 $ call     : language density.default(x = Sepal.Width)
 $ data.name: chr "Sepal.Width"
 $ has.na   : logi FALSE
 - attr(*, "class")= chr "density"
> with(sepalwd, which.max(y)) ## which value has maximal density?
[1] 224
> with(sepalwd, x[which.max(y)]) ## use the above to find the mode
[1] 3.000314

See ?density for more info. By default, density() evaluates the kernel density estimate at n = 512 equally spaced locations. If this is too crude for you, increase the number of locations evaluated and returned:

> sepalwd2 <- with(iris, density(Sepal.Width, n = 2048))
> with(sepalwd, x[which.max(y)])
[1] 3.000314

In this case it doesn't alter the result.

like image 50
Gavin Simpson Avatar answered Nov 04 '25 07:11

Gavin Simpson