NA in clustering functions (kmeans, pam, clara). How to associate clusters to original data?

Tags:

I need to cluster some data and I tried kmeans, pam, and clara with R.

The problem is that my data are in a column of a data frame, and contains NAs.

I used na.omit() to get my clusters. But then how can I associate them with the original data? The functions return a vector of integers without the NAs and they don't retain any information about the original position.

Is there a clever way to associate the clusters to the original observations in the data frame? (or a way to intelligently perform clustering when NAs are present?)

Thanks

259

asked Dec 18 '14 11:12

Bakaburg

2 Answers

The output of kmeans corresponds to the elements of the object passed as argument x. In your case, you omit the NA elements, and so $cluster indicates the cluster that each element of na.omit(x) belongs to.

Here's a simple example:

d <- data.frame(x=runif(100), cluster=NA)
d$x[sample(100, 10)] <- NA
clus <- kmeans(na.omit(d$x), 5)

d$cluster[which(!is.na(d$x))] <- clus$cluster

And in the plot below, colour indicates the cluster that each point belongs to.

plot(d$x, bg=d$cluster, pch=21)

enter image description here

138

answered Sep 27 '22 22:09

jbaums

This code works for me, starting with a matrix containing a whole row of NAs:

DF=matrix(rnorm(100), ncol=10)
row.names(DF) <- paste("r", 1:10, sep="")
DF[3,]<-NA
res <- kmeans(na.omit(DF), 3)$cluster
res
DF=cbind(DF, 'clus'=NA)
DF[names(res),][,11] <- res
print(DF[,11])

answered Sep 27 '22 23:09

agenis

Related questions
                            
                                Bipartite network graph with ggplot2
                            
                                When should data go in /data, and when should it go in /inst/extdata?
                            
                                Converting simple ggplot2 code to use data.table
                            
                                how to edit or modify or change a single line in a large text file with R
                            
                                NaN is removed when using na.rm=TRUE
                            
                                Align edges of ggplot choropleth (legend title varies)
                            
                                rapply to nested list of data frames in R
                            
                                prevent knitr/Rmarkdown from interleaving chunk output with code
                            
                                `geom_line()` connects points mapped to different groups
                            
                                Adding a counter column for a set of similar rows in R [duplicate]
                            
                                Adding principal components as variables to a data frame
                            
                                R :Plot and save in a pdf file
                            
                                GGally - unexpected behavior with ggpairs(..., diag = list( continuous = 'density'))
                            
                                How do I reinstall a base-R package (e.g., stats, graphics, utils, etc.)?
                            
                                fread() fails with missing values in integer64 columns
                            
                                splice in a bquote in R
                            
                                Replace entire strings based on partial match
                            
                                I can't generate \label{fig:mwe-plot} with knitr
                            
                                Dodging points and error bars with ggplot
                            
                                How to end a header 3 box in rmarkdown beamer madrid presentation?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

NA in clustering functions (kmeans, pam, clara). How to associate clusters to original data?

Tags:

r

missing-data

na

cluster-analysis

k-means

Bakaburg

People also ask

2 Answers

jbaums

agenis

Recent Activity

Donate For Us