3. Clustering algorithms receive global dataset¶

Context and Problem Statement¶

Sometimes the data coming into a mapper pipeline is not point cloud data. The most common example of this would be a distance matrix. As zen mapper aims to be a flexible core for building mapper pipelines we need to allow for these types of inputs.

As currently written the clustering protocol does not facilitate this. The mapper method does the indexing into the high dimensional data which requires a fixed understanding of what that high dimensional data represents.

Decision Drivers¶

Minimal overhead for people implementing new clustering algorithms
Flexibility, as many use cases as possible should be enabled
Type safety, python type checkers should be able to tell what is going on

Considered Options¶

Change the cluster protocol to receive global dataset
Implement multiple clustering protocols for different cases

Decision Outcome¶

Change the cluster protocol to receive global dataset

By changing the cluster protocol to take the entire high dimensional dataset along with the indices of the desired subset to cluster it is up to the cluster author what the high dimensional data represents.

Consequences¶

Good, because it keeps the core algorithm simple
Good, because it allows for many use cases
Bad, because the type hinting is more complicated
Bad, because potentially cryptic error messages