Guys I'm new to this language ,I'm running cluster analysis on a data frame but when I calculate the distance I get this warning "NAs introduced by coercion". What does this mean?
d <- dist(as.matrix(mydata1))
Warning message:
In dist(as.matrix(mydata1)) : NAs introduced by coercion
My data sample is
Metafamily Total July cpc July cse_pla July offline July organic
xerox 8560 275.829417 0.20943223 0.032628862 0.169210813 0.1130048
office-supplie 246.9125664 0.057833047 0.020209909 0.535358617 0.136165617
In this apart from Metafamily column all columns are numeric in class.
Guys please help me out from this issue.
It's that first column that creates the issue:
> a <- c("1", "2",letters[1:5], "3")
> as.numeric(a)
[1] 1 2 NA NA NA NA NA 3
Warning message:
NAs introduced by coercion
Inside dist
there must be a coercion to numeric, which generates the NA as above.
I'd suggestion to apply dist
without the first column or better move that to rownames
if possible, because the result will be different:
> dist(df)
1 2 3 4
2 1.8842186
3 1.9262360 1.2856110
4 3.2137871 1.7322788 2.9838920
5 1.3299455 0.9872963 1.9158079 1.8889050
Warning message:
In dist(df) : NAs introduced by coercion
> dist(df[-1])
1 2 3 4
2 1.538458
3 1.572765 1.049697
4 2.624046 1.414400 2.436338
5 1.085896 0.806124 1.564251 1.542284
btw: you don't need as.matrix
when calling dist
. It'll do that anyway internally.
EDIT: using rownames
rownames(df) <- df$id
> df
id var1 var2
A A -0.6264538 -0.8204684
B B 0.1836433 0.4874291
C C -0.8356286 0.7383247
D D 1.5952808 0.5757814
E E 0.3295078 -0.3053884
> dist(df[-1]) # you colud also remove the 1st col at all, using df$id <- NULL.
A B C D
B 1.538458
C 1.572765 1.049697
D 2.624046 1.414400 2.436338
E 1.085896 0.806124 1.564251 1.542284
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With