I am trying to understand how to parallelize some of my code using R. So, in the following example I want to use k-means to cluster data using 2,3,4,5,6 centers, while using 20 iterations. Here is the code:
library(parallel)
library(BLR)
data(wheat)
parallel.function <- function(i) {
kmeans( X[1:100,100], centers=?? , nstart=i )
}
out <- mclapply( c(5, 5, 5, 5), FUN=parallel.function )
How can we parallel simultaneously the iterations and the centers? How to track the outputs, assuming I want to keep all the outputs from k-means across all, iterations and centers, just to learn how?
• k-means clustering is a method of clustering which aims to partition n data points into k clusters (n >> k) in which each observation belongs to the cluster with the nearest mean. • The nearness is calculated by distance function which is mostly Euclidian distance or Manhattan distance. •
Parallel processing (in the extreme) means that all the f# processes start simultaneously and run to completion on their own. If we have a single computer at our disposal and have to run n models, each taking s seconds, the total running time will be n*s .
There are various packages in R which allow parallelization. “parallel” Package The parallel package in R can perform tasks in parallel by providing the ability to allocate cores to R. The working involves finding the number of cores in the system and allocating all of them or a subset to make a cluster.
K Means Clustering in R Programming is an Unsupervised Non-linear algorithm that cluster data based on similarity or similar groups. It seeks to partition the observations into a pre-specified number of clusters. Segmentation of data takes place to assign each training example to a segment called a cluster.
This looked very simple to me at first ... and then i tried it. After a lot of monkey typing and face palming during my lunch break however, I arrived at this:
library(parallel)
library(BLR)
data(wheat)
mc = mclapply(2:6, function(x,centers)kmeans(x, centers), x=X)
It looks right though I didn't check how sensible the clustering was.
> summary(mc)
Length Class Mode
[1,] 9 kmeans list
[2,] 9 kmeans list
[3,] 9 kmeans list
[4,] 9 kmeans list
[5,] 9 kmeans list
On reflection the command syntax seems sensible - although a lot of other stuff that failed seemed reasonable too...The examples in the help documentation are maybe not that great.
Hope it helps.
EDIT
As requested here is that on two variables nstart
and centers
(pars = expand.grid(i=1:3, cent=2:4))
i cent
1 1 2
2 2 2
3 3 2
4 1 3
5 2 3
6 3 3
7 1 4
8 2 4
9 3 4
L=list()
# zikes horrible
pars2=apply(pars,1,append, L)
mc = mclapply(pars2, function(x,pars)kmeans(x, centers=pars$cent,nstart=pars$i ), x=X)
> summary(mc)
Length Class Mode
[1,] 9 kmeans list
[2,] 9 kmeans list
[3,] 9 kmeans list
[4,] 9 kmeans list
[5,] 9 kmeans list
[6,] 9 kmeans list
[7,] 9 kmeans list
[8,] 9 kmeans list
[9,] 9 means list
How'd you like them apples?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With