.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/custom_clusterer.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_examples_custom_clusterer.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_custom_clusterer.py:


Creating a custom clusterer
---------------------------

This example will go over the API for creating a clusterer. We will implement a
density-based clusterer (no real thought was put into how useful of a clusterer
this is). Given an epsilon > 0, we calculate the number of neighbors a point
has within an epsilon ball. We then cluster the dataset
into k (num_clusters) sets, aiming to equally divide the number of possible
neighbors. Note that two points being in
the same cluster tells us nothing about their Euclidean distance.

.. GENERATED FROM PYTHON SOURCE LINES 14-21

Creating a custom clusterer
===========================

We will start by defining an epsilon neighbor function. Note that zen-mapper
expects the clusterer to be an iterator that returns arrays. Each array
corresponds to a cluster, and the elements of the array are the indices for
the data points in that cluster. Outliers can be removed if necessary.

.. GENERATED FROM PYTHON SOURCE LINES 21-98

.. code-block:: Python


    from collections.abc import Iterator

    import matplotlib.pyplot as plt
    import networkx as nx
    import numpy as np

    import zen_mapper as zm


    def neighbor_counts(epsilon: float, data: np.ndarray) -> np.ndarray:
        assert epsilon > 0, "Epsilon must be greater than 0."
        if data.size == 0:
            return np.array([])  # return an empty array if given empty data
        n_points = data.shape[0]

        # initialize neighbor count
        neighbor_count = np.zeros(n_points, dtype=int)

        for i in range(n_points):
            distances = np.linalg.norm(data - data[i], axis=1)
            neighbor_count[i] = np.sum(distances <= epsilon) - 1

        return neighbor_count


    def get_clusters(
        num_clusters: int, neighbor_counts: np.ndarray
    ) -> Iterator[np.ndarray]:
        max_neighbors = np.max(neighbor_counts) if np.any(neighbor_counts) else 0

        clusters_per_cluster = max_neighbors // num_clusters
        remainder = max_neighbors % num_clusters

        start = 0
        for i in range(num_clusters):
            end = start + clusters_per_cluster + (1 if i < remainder else 0)
            # distributes the remainder across initial clusters
            indices = np.flatnonzero((neighbor_counts > start) & (neighbor_counts <= end))
            # does not include isolated points

            yield indices  # yields the current cluster
            start = end  # moves the start index for the next cluster


    def epsilon_density_clusterer(
        epsilon: float, num_clusters: int, data: np.ndarray
    ) -> Iterator[np.ndarray]:
        """
        Performs density-based clustering using an epsilon neighborhood.

        Parameters:
        - epsilon: A positive float defining the radius for neighbor counting.
        - k: An integer specifying the number of desired clusters.
        - data: A 2D NumPy array of shape (n_samples, n_features) representing the dataset.

        Returns:
        - An iterator yielding clusters,
        where each cluster is represented as a
        NumPy array of indices.
        """
        # calculate the density counts for the given data
        density_counts = neighbor_counts(epsilon=epsilon, data=data)

        # generate and return clusters based on the density counts
        return get_clusters(num_clusters=num_clusters, neighbor_counts=density_counts)


    # make our clusterer passable to zen-mapper:
    def clusterer(data):
        return epsilon_density_clusterer(epsilon, num_clusters, data), None


    # clustering parameters
    epsilon = 0.1
    num_clusters = 4


.. GENERATED FROM PYTHON SOURCE LINES 99-100

we can then visualize to see if this clusterer acts as we would expect.

.. GENERATED FROM PYTHON SOURCE LINES 100-142

.. code-block:: Python


    # dataset parameters
    n_points = 4000
    noise_level = 0.1  # Noise level for the radius

    # generate angles uniformly between 0 and 2*pi
    angles = np.linspace(0, 2 * np.pi, n_points)

    # generate radii close to 1 with some noise
    radii = 1 + noise_level * np.random.randn(n_points)

    # convert polar coordinates to Cartesian coordinates
    x = radii * np.cos(angles)
    y = radii * np.sin(angles)


    # stack x and y into a 2D array for clustering
    data = np.column_stack((x, y))

    clusters = list(
        epsilon_density_clusterer(epsilon=epsilon, num_clusters=num_clusters, data=data)
    )

    # create an array for cluster labels
    cluster_labels = np.full(data.shape[0], -1)  # use default label for noise points
    for cluster_id, indices in enumerate(clusters):
        cluster_labels[indices] = cluster_id

    # plotting
    plt.figure(figsize=(10, 6))
    scatter = plt.scatter(
        data[:, 0], data[:, 1], c=cluster_labels, cmap="viridis", s=30, alpha=0.75
    )
    plt.colorbar(scatter, label="Cluster ID")
    plt.title("Density Clustering")
    plt.xlabel("x")
    plt.ylabel("y")
    plt.axis("equal")
    plt.grid(True)
    plt.show()


.. image-sg:: /examples/images/sphx_glr_custom_clusterer_001.png
   :alt: Density Clustering
   :srcset: /examples/images/sphx_glr_custom_clusterer_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 143-147

Using with mapper
=================

Now we have a clusterer compatible with zen-mapper.

.. GENERATED FROM PYTHON SOURCE LINES 147-167

.. code-block:: Python


    cover_scheme = zm.Width_Balanced_Cover(n_elements=5, percent_overlap=0.25)
    cover = cover_scheme(data)
    projection = data[:, 0]

    result = zm.mapper(
        data=data,
        projection=projection,
        cover_scheme=cover_scheme,
        clusterer=clusterer,
        dim=1,
    )

    graph = zm.to_networkx(result.nerve)

    # plot the mapper graph
    nx.draw_kamada_kawai(graph)
    plt.show()


.. image-sg:: /examples/images/sphx_glr_custom_clusterer_002.png
   :alt: custom clusterer
   :srcset: /examples/images/sphx_glr_custom_clusterer_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 168-170

Coloring the nodes
==================

.. GENERATED FROM PYTHON SOURCE LINES 170-201

.. code-block:: Python


    density_counts = neighbor_counts(epsilon=epsilon, data=data)

    node_densities = {}

    for node_id, indices in enumerate(result.nodes):
        # calculate the average density for the current cluster
        node_densities[node_id] = np.mean(density_counts[indices])

    # create a color map based on node densities
    node_colors = [node_densities[node] for node in graph.nodes]

    # set up figure and axis
    fig, ax = plt.subplots(figsize=(10, 6))

    # plot the mapper graph with nodes colored by density
    pos = nx.kamada_kawai_layout(graph)
    sm = nx.draw_networkx_nodes(
        graph,
        pos,
        node_color=node_colors,
        node_size=100,
        ax=ax,
    )
    nx.draw_networkx_edges(graph, pos)

    # add the color bar for average density
    cbar = fig.colorbar(sm, ax=ax, label="Average Density")

    plt.title("Mapper Graph Colored by Average Density")
    plt.show()


.. image-sg:: /examples/images/sphx_glr_custom_clusterer_003.png
   :alt: Mapper Graph Colored by Average Density
   :srcset: /examples/images/sphx_glr_custom_clusterer_003.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 1.598 seconds)


.. _sphx_glr_download_examples_custom_clusterer.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: custom_clusterer.ipynb <custom_clusterer.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: custom_clusterer.py <custom_clusterer.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: custom_clusterer.zip <custom_clusterer.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_