Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting an actual boundary equation of LDA with two classes and two variables

I have a two-dimensional data plot that shows the sexual morphological difference of some insect species like this: data plot. My purpose is to obtain an estimated boundary of the linear discrimination analysis (LDA), which will be re-used elsewhere or described in a manuscript as "Y = bX + c", to compare it with other species.

Here is the R code of the truncated data set:

library(MASS)

## 2-dimensional data with 2 classes
df <- structure(list(
group = c("F", "F", "F", "F", "F", "F", "F", "F", "M", "M", "M", "M", "M", "M", "M", "M"), 
var1 = c(4.77, 4.08, 4.25, 4.72, 3.83, 5.23, 4.31, 4.67, 5.23, 4.95, 4.87, 5.06, 4.36, 4.51, 4.61, 4.49),
var2 = c(7.41, 6.87, 6.95, 7.72, 6.11, 7.96, 7.05, 7.69, 7.10, 6.55, 6.61, 6.86, 5.97, 5.99, 6.26, 6.40)),
row.names = c(NA, -16L), 
class = "data.frame")

## LDA 
fit <- lda(group~., data=df)
plot(fit)

and the contents of 'fit' appears like this:

> fit
Call:
lda(group ~ ., data = df)

Prior probabilities of groups:
  F   M 
0.5 0.5 

Group means:
    var1   var2
F 4.4825 7.2200
M 4.7600 6.4675

Coefficients of linear discriminants:
           LD1
var1  7.567377
var2 -5.860438

Okay, there is only LD1 since I have only two classes (Female and Male). However, how to know (*NOT to draw) the actual boundary line on the original data plane?

In short, I want to know the formula of the "estimated boundary" in this article: Linear Discriminant Analysis (LDA) Can Be So Easy https://towardsdatascience.com/linear-discriminant-analysis-lda-can-be-so-easy-b3f46e32f982/

or, the way to extract the coefficients β_0 and β_1 from the lda object of R.

like image 493
Xun Avatar asked Dec 19 '25 15:12

Xun


1 Answers

Here is a way of plotting the boundary line with ggplot2.

library(MASS)
library(ggplot2)

## 2-dimensional data with 2 classes
df <- structure(list(
  group = c("F", "F", "F", "F", "F", "F", "F", "F", "M", "M", "M", "M", "M", "M", "M", "M"), 
  var1 = c(4.77, 4.08, 4.25, 4.72, 3.83, 5.23, 4.31, 4.67, 5.23, 4.95, 4.87, 5.06, 4.36, 4.51, 4.61, 4.49),
  var2 = c(7.41, 6.87, 6.95, 7.72, 6.11, 7.96, 7.05, 7.69, 7.10, 6.55, 6.61, 6.86, 5.97, 5.99, 6.26, 6.40)),
  row.names = c(NA, -16L), 
  class = "data.frame")

## LDA 
fit <- lda(group~., data=df)
beta <- fit$scaling
intercept <- mean(fit$means %*% beta) / beta[2]
slope <- -beta[1] / beta[2]

ggplot(df, aes(var1, var2, color = group)) +
  geom_point(size = 5) +
  geom_abline(slope = slope, intercept = intercept) +
  theme_bw()

Created on 2025-07-15 with reprex v2.1.1

like image 173
Rui Barradas Avatar answered Dec 21 '25 08:12

Rui Barradas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!