parallel k-means in R

Tags:

I am trying to understand how to parallelize some of my code using R. So, in the following example I want to use k-means to cluster data using 2,3,4,5,6 centers, while using 20 iterations. Here is the code:

library(parallel)
library(BLR)

data(wheat)

parallel.function <- function(i) {
    kmeans( X[1:100,100], centers=?? , nstart=i )
}

out <- mclapply( c(5, 5, 5, 5), FUN=parallel.function )

How can we parallel simultaneously the iterations and the centers? How to track the outputs, assuming I want to keep all the outputs from k-means across all, iterations and centers, just to learn how?

866

asked Dec 06 '13 05:12

hema

1 Answers

This looked very simple to me at first ... and then i tried it. After a lot of monkey typing and face palming during my lunch break however, I arrived at this:

library(parallel)
library(BLR)

data(wheat)

mc = mclapply(2:6, function(x,centers)kmeans(x, centers), x=X)

It looks right though I didn't check how sensible the clustering was.

> summary(mc)
     Length Class  Mode
[1,] 9      kmeans list
[2,] 9      kmeans list
[3,] 9      kmeans list
[4,] 9      kmeans list
[5,] 9      kmeans list

On reflection the command syntax seems sensible - although a lot of other stuff that failed seemed reasonable too...The examples in the help documentation are maybe not that great.

Hope it helps.

EDIT As requested here is that on two variables nstart and centers

(pars = expand.grid(i=1:3, cent=2:4))

  i cent
1 1    2
2 2    2
3 3    2
4 1    3
5 2    3
6 3    3
7 1    4
8 2    4
9 3    4

L=list()
# zikes horrible
pars2=apply(pars,1,append, L)
mc = mclapply(pars2, function(x,pars)kmeans(x, centers=pars$cent,nstart=pars$i ), x=X)

> summary(mc)
      Length Class  Mode
 [1,] 9      kmeans list
 [2,] 9      kmeans list
 [3,] 9      kmeans list
 [4,] 9      kmeans list
 [5,] 9      kmeans list
 [6,] 9      kmeans list
 [7,] 9      kmeans list
 [8,] 9      kmeans list
 [9,] 9      means list

How'd you like them apples?

answered Nov 06 '22 21:11

Stephen Henderson

Related questions
                            
                                Getting numerator and denominator of a fraction in R
                            
                                Scaling data in R gives spurious Error "length of 'center' must equal the number of columns of 'x'"
                            
                                B Spline confusion
                            
                                Computing the circularity of a binary image
                            
                                Selective suppressWarnings() that filters by regular expression
                            
                                Calculating a weighted mean using data.table in R with weights in one of the table columns
                            
                                Speeding up computation of Dice coefficient in C / Rcpp
                            
                                which.max() does not return NA
                            
                                Converting a list into a character object in R
                            
                                R read dates in format yyyymmdd [duplicate]
                            
                                How to find unique field values from two columns in data frame
                            
                                Creating custom legends in ggplot2
                            
                                Formatting numbers on a continuous axis in ggplot
                            
                                ESS & Knitr/Sweave: How to source the Rnw file into an interactive session?
                            
                                Equivalent to ddply(...,transform,...) in data.table
                            
                                Generating a function
                            
                                How to get the runtime of an R script?
                            
                                split string at first number
                            
                                Run every file in a folder
                            
                                R data.table subsetting on multiple conditions.

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

parallel k-means in R

Tags:

r

parallel-processing

parallel-foreach

hema

People also ask

1 Answers

Stephen Henderson

Recent Activity

Donate For Us