Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cumulative variance explained for NMDS in R

Tags:

r

variance

Is there a way to determine the cumulative variance explained (metric fit or R^2m) from an NMDS object with the function metaMDS? The object returns values for stress, scores, points, but I do not see variance. The function comes from the vegan package and performs Non-Metric Multidimensional Scaling.

metaMDS(comm, distance = "bray", k = 2, try = 20, trymax = 20, 
engine = c("monoMDS", "isoMDS"), autotransform =TRUE,
noshare = (engine == "isoMDS"), wascores = TRUE, expand = TRUE, 
trace = 1, plot = FALSE, previous.best,  ...)

I read that the R^2 is 1-total stress?

Appreciate any suggestions.

like image 395
Becca Avatar asked Mar 11 '18 18:03

Becca


1 Answers

There is no % of variance associated with each axis in nMDS in contrast with other Principal Component Methods like PCA, CA, PCoA (= MDS).

I quote Legendre & Legendre 2012 :

Contrary to PCA, PCoA, or CA, which are eigenvector-based methods, nMDS calculations do not maximize the variability associated with individual axes of the ordination

The aim of NMDS is to preserve and represent the ordering of the observations in few dimensions while the objective of the other methods is more to preserve the exact distance between observations, and to find the combination of originial axes that maximise the explained variance.

You can check and visualize the quality of the representation in 2 dimensions with a "Shepard" diagram that represent the distance in the 2 dimensions of an ordination with the original distance in the k dimensional space.
Here is an example to compare the quality of the representation in 2 dimensions of a nMDS and a MDS (PCoA) based on the Bray-Curtis distance.

library(vegan)
data(dune)

nMDS <- metaMDS(dune, distance = "bray", k = 2)
MDS <- cmdscale(vegdist(dune, method = "bray"), k = 2, eig = T, add = T )

% of variance explained by the MDS axes

round(MDS$eig*100/sum(MDS$eig),1)
#>  [1] 30.3 18.7  9.5  8.0  6.6  5.5  4.3  3.0  2.7  2.4  2.2  1.7  1.6  1.5
#> [15]  0.8  0.6  0.4  0.1  0.0  0.0

Shepard diagrams

# x11(width = 18/2.54, height = 9/2.54)

par(mfrow = c(1,2), mar = c(3.5,3.5,3,1), mgp = c(2, 0.6, 0), cex = 0.8, las = 1)
spear <- round(cor(vegdist(dune, method = "bray"), dist(nMDS$points), method = "spearman"),3)
plot(vegdist(dune, method = "bray"), dist(nMDS$points), main = "Shepard diagram of nMDS", 
     xlab = "True Bray-Curtis distance", ylab = "Distance in the reduced space")
mtext(line = 0.1, text = paste0("Spearman correlation = ", spear), cex = 0.7)

spear <- round(cor(vegdist(dune, method = "bray"), dist(MDS$points), method = "spearman"),3)
plot(vegdist(dune, method = "bray"), dist(MDS$points), main = "Shepard diagram of MDS", 
     xlab = "True Bray-Curtis distance", ylab = "Distance in the reduced space")
mtext(line = 0.1, text = paste0("Spearman correlation = ", spear), cex = 0.7)

To decide how much dimensions you need you might plot the stress as a function of the number of dimensions. Note that, in contrast with a traditional scree plot, each bar does not represent the variance associated with each axis but the total stress (a function of the squared difference between d and d_hat) for all the dimensions. For example, the "3Dim" bar represent the stress of a solution in 3 dimensions, not the stress associated with the 3rd axis...
Here the improvements of the representation for dimensions > than 2 dimensions are low.

n = 10
stress <- vector(length = n)
for (i in 1:n) {
    stress[i] <- metaMDS(dune, distance = "bray", k = i)$stress
}
names(stress) <- paste0(1:n, "Dim")
# x11(width = 10/2.54, height = 7/2.54)
par(mar = c(3.5,3.5,1,1), mgp = c(2, 0.6, 0), cex = 0.8, las = 2)
barplot(stress, ylab = "stress")

Created on 2018-03-11 by the reprex package (v0.2.0).

like image 104
Gilles Avatar answered Nov 03 '22 02:11

Gilles