Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate bivariate data of different shapes (e.g., square, circle, rectangle) with outliers?

I am currently looking for some tool that would generate datasets of different shapes like square, circle, rectangle, etc. with outliers for cluster analysis.

Can any one of you recommend a good dataset generator for cluster analysis? Is there anyway to generates such datasets in languages like R?

like image 702
Pradeep Avatar asked Jan 18 '11 09:01

Pradeep


3 Answers

I would create a shape and extract bounding coordinates. You can populate the shape with random points using splancs package.

Here's a small snippet from one of my programs:

# First we create a circle, into which uniform random points will be generated (kudos to Barry Rowlingson, r-sig-geo).
circle <-  function(x = x, y = y, r = radius, n = n.faces){
    t <- seq(from = 0, to = 2 * pi, length = n + 1)[-1]
    t <- cbind(x = x + r * sin(t), y = y+ r * cos(t))
    t <- rbind(t, t[1,])
    return(t)
}

csr(circle(0, 0, 100, 30), 1000)

alt text

Feel free to add outliers. One way of going about this is sampling different shapes and joining them in different ways.

like image 61
Roman Luštrik Avatar answered Oct 15 '22 06:10

Roman Luštrik


You should probably look into the mlbench package, especially synthetic dataset generating from mlbench.* functions, see some examples below.

enter image description here

Other datasets or utility functions are probably best found on the Cluster Task View on CRAN. As @Roman said, adding outliers is not really difficult, especially when you work in only two dimensions.

like image 30
chl Avatar answered Oct 15 '22 05:10

chl


There is a flexible data generator in ELKI that can generate various distributions in arbitrary dimensionality. It also can generate Gamma distributed variables, for example.

There is documentation on the Wiki: http://elki.dbs.ifi.lmu.de/wiki/DataSetGenerator

like image 31
Has QUIT--Anony-Mousse Avatar answered Oct 15 '22 06:10

Has QUIT--Anony-Mousse