The data format was chosen to be simple ASCII formats which are easily parsed by both programs and humans. There are two data formats used, an unclustered dataset and a clustered dataset.
The unclustered dataset simply specifies points followed by their annotations, one per line. Each line has the format:
The clustered dataset is basically the same as the unclustered dataset, except that each point is prefixed with the cluster number it belongs to. That is,
The one exception to this is where the annotation is ``representative''. This indicates that the point is actually the representative point of the cluster.