.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/custom_clusterer.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_custom_clusterer.py: Creating a custom clusterer --------------------------- This example will go over the API for creating a clusterer. We will implement a density-based clusterer (no real thought was put into how useful of a clusterer this is). Given an epsilon > 0, we calculate the number of neighbors a point has within an epsilon ball. We then cluster the dataset into k (num_clusters) sets, aiming to equally divide the number of possible neighbors. Note that two points being in the same cluster tells us nothing about their Euclidean distance. .. GENERATED FROM PYTHON SOURCE LINES 14-21 Creating a custom clusterer =========================== We will start by defining an epsilon neighbor function. Note that zen-mapper expects the clusterer to be an iterator that returns arrays. Each array corresponds to a cluster, and the elements of the array are the indices for the data points in that cluster. Outliers can be removed if necessary. .. GENERATED FROM PYTHON SOURCE LINES 21-98 .. code-block:: Python from collections.abc import Iterator import matplotlib.pyplot as plt import networkx as nx import numpy as np import zen_mapper as zm def neighbor_counts(epsilon: float, data: np.ndarray) -> np.ndarray: assert epsilon > 0, "Epsilon must be greater than 0." if data.size == 0: return np.array([]) # return an empty array if given empty data n_points = data.shape[0] # initialize neighbor count neighbor_count = np.zeros(n_points, dtype=int) for i in range(n_points): distances = np.linalg.norm(data - data[i], axis=1) neighbor_count[i] = np.sum(distances <= epsilon) - 1 return neighbor_count def get_clusters( num_clusters: int, neighbor_counts: np.ndarray ) -> Iterator[np.ndarray]: max_neighbors = np.max(neighbor_counts) if np.any(neighbor_counts) else 0 clusters_per_cluster = max_neighbors // num_clusters remainder = max_neighbors % num_clusters start = 0 for i in range(num_clusters): end = start + clusters_per_cluster + (1 if i < remainder else 0) # distributes the remainder across initial clusters indices = np.flatnonzero((neighbor_counts > start) & (neighbor_counts <= end)) # does not include isolated points yield indices # yields the current cluster start = end # moves the start index for the next cluster def epsilon_density_clusterer( epsilon: float, num_clusters: int, data: np.ndarray ) -> Iterator[np.ndarray]: """ Performs density-based clustering using an epsilon neighborhood. Parameters: - epsilon: A positive float defining the radius for neighbor counting. - k: An integer specifying the number of desired clusters. - data: A 2D NumPy array of shape (n_samples, n_features) representing the dataset. Returns: - An iterator yielding clusters, where each cluster is represented as a NumPy array of indices. """ # calculate the density counts for the given data density_counts = neighbor_counts(epsilon=epsilon, data=data) # generate and return clusters based on the density counts return get_clusters(num_clusters=num_clusters, neighbor_counts=density_counts) # make our clusterer passable to zen-mapper: def clusterer(data): return epsilon_density_clusterer(epsilon, num_clusters, data), None # clustering parameters epsilon = 0.1 num_clusters = 4 .. GENERATED FROM PYTHON SOURCE LINES 99-100 we can then visualize to see if this clusterer acts as we would expect. .. GENERATED FROM PYTHON SOURCE LINES 100-142 .. code-block:: Python # dataset parameters n_points = 4000 noise_level = 0.1 # Noise level for the radius # generate angles uniformly between 0 and 2*pi angles = np.linspace(0, 2 * np.pi, n_points) # generate radii close to 1 with some noise radii = 1 + noise_level * np.random.randn(n_points) # convert polar coordinates to Cartesian coordinates x = radii * np.cos(angles) y = radii * np.sin(angles) # stack x and y into a 2D array for clustering data = np.column_stack((x, y)) clusters = list( epsilon_density_clusterer(epsilon=epsilon, num_clusters=num_clusters, data=data) ) # create an array for cluster labels cluster_labels = np.full(data.shape[0], -1) # use default label for noise points for cluster_id, indices in enumerate(clusters): cluster_labels[indices] = cluster_id # plotting plt.figure(figsize=(10, 6)) scatter = plt.scatter( data[:, 0], data[:, 1], c=cluster_labels, cmap="viridis", s=30, alpha=0.75 ) plt.colorbar(scatter, label="Cluster ID") plt.title("Density Clustering") plt.xlabel("x") plt.ylabel("y") plt.axis("equal") plt.grid(True) plt.show() .. image-sg:: /examples/images/sphx_glr_custom_clusterer_001.png :alt: Density Clustering :srcset: /examples/images/sphx_glr_custom_clusterer_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 143-147 Using with mapper ================= Now we have a clusterer compatible with zen-mapper. .. GENERATED FROM PYTHON SOURCE LINES 147-167 .. code-block:: Python cover_scheme = zm.Width_Balanced_Cover(n_elements=5, percent_overlap=0.25) cover = cover_scheme(data) projection = data[:, 0] result = zm.mapper( data=data, projection=projection, cover_scheme=cover_scheme, clusterer=clusterer, dim=1, ) graph = zm.to_networkx(result.nerve) # plot the mapper graph nx.draw_kamada_kawai(graph) plt.show() .. image-sg:: /examples/images/sphx_glr_custom_clusterer_002.png :alt: custom clusterer :srcset: /examples/images/sphx_glr_custom_clusterer_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 168-170 Coloring the nodes ================== .. GENERATED FROM PYTHON SOURCE LINES 170-201 .. code-block:: Python density_counts = neighbor_counts(epsilon=epsilon, data=data) node_densities = {} for node_id, indices in enumerate(result.nodes): # calculate the average density for the current cluster node_densities[node_id] = np.mean(density_counts[indices]) # create a color map based on node densities node_colors = [node_densities[node] for node in graph.nodes] # set up figure and axis fig, ax = plt.subplots(figsize=(10, 6)) # plot the mapper graph with nodes colored by density pos = nx.kamada_kawai_layout(graph) sm = nx.draw_networkx_nodes( graph, pos, node_color=node_colors, node_size=100, ax=ax, ) nx.draw_networkx_edges(graph, pos) # add the color bar for average density cbar = fig.colorbar(sm, ax=ax, label="Average Density") plt.title("Mapper Graph Colored by Average Density") plt.show() .. image-sg:: /examples/images/sphx_glr_custom_clusterer_003.png :alt: Mapper Graph Colored by Average Density :srcset: /examples/images/sphx_glr_custom_clusterer_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.598 seconds) .. _sphx_glr_download_examples_custom_clusterer.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: custom_clusterer.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: custom_clusterer.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: custom_clusterer.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_