next up previous
Next: 2.3.3 cluster Up: 2.3 Usage Previous: 2.3.1 General usage

2.3.2 generate

The generate program creates a random data set in the following way. It creates \( N \) random noise points. It then creates \( k\) clusters by choosing \( k\) points which will be the centres of circles of radius \( r \). We then randomly choose a cluster and a point within the circle for each of the \( n \) cluster points being creating.

No input is required on standard input, and the unclustered dataset is written to standard output.

The annotation of each point is the cluster number it belongs to, or -1 for outliers. This aids greatly in evaluating how well a given clustering algorithm performs on the dataset.


Usage:

generate [OPTIONS]

-help
Outputs a brief usage message.
-r <real>
Specifies the radius of the clusters (default 0.01).
-k <int>
Specifies the number of clusters (default 4).
-n <int>
Specifies the number of clustered points (default 250).
-N <int>
Specifies the number of noise points (default 50).



Kevin Pulo
2000-08-23