"NAs introduced by coercion" during Cluster Analysis in R

Question

Guys I'm new to this language ,I'm running cluster analysis on a data frame but when I calculate the distance I get this warning "NAs introduced by coercion". What does this mean?

d <- dist(as.matrix(mydata1))

  Warning message:
In dist(as.matrix(mydata1)) : NAs introduced by coercion

My data sample is

Metafamily     Total         July cpc      July cse_pla    July offline   July organic  
xerox 8560     275.829417    0.20943223    0.032628862     0.169210813    0.1130048 
office-supplie  246.9125664  0.057833047   0.020209909     0.535358617    0.136165617

In this apart from Metafamily column all columns are numeric in class.

Guys please help me out from this issue.

Michele · Accepted Answer

It's that first column that creates the issue:

> a <- c("1", "2",letters[1:5], "3")
> as.numeric(a)
[1]  1  2 NA NA NA NA NA  3
Warning message:
NAs introduced by coercion

Inside dist there must be a coercion to numeric, which generates the NA as above.

I'd suggestion to apply dist without the first column or better move that to rownames if possible, because the result will be different:

> dist(df)
          1         2         3         4
2 1.8842186                              
3 1.9262360 1.2856110                    
4 3.2137871 1.7322788 2.9838920          
5 1.3299455 0.9872963 1.9158079 1.8889050
Warning message:
In dist(df) : NAs introduced by coercion
> dist(df[-1])
         1        2        3        4
2 1.538458                           
3 1.572765 1.049697                  
4 2.624046 1.414400 2.436338         
5 1.085896 0.806124 1.564251 1.542284

btw: you don't need as.matrix when calling dist. It'll do that anyway internally.

EDIT: using rownames

rownames(df) <- df$id

> df
  id       var1       var2
A  A -0.6264538 -0.8204684
B  B  0.1836433  0.4874291
C  C -0.8356286  0.7383247
D  D  1.5952808  0.5757814
E  E  0.3295078 -0.3053884

> dist(df[-1]) # you colud also remove the 1st col at all, using df$id <- NULL.
         A        B        C        D
B 1.538458                           
C 1.572765 1.049697                  
D 2.624046 1.414400 2.436338         
E 1.085896 0.806124 1.564251 1.542284

"NAs introduced by coercion" during Cluster Analysis in R

Tags:

r

cluster-analysis

Ravee

Video Answer

1 Answers

Michele

Recent Activity

Donate For Us

"NAs introduced by coercion" during Cluster Analysis in R

Tags:

r

cluster-analysis

Ravee

Video Answer

1 Answers

Michele

Related questions

Recent Activity

Donate For Us