Plotting

This is a description of the PlottableProtocol,. Any plotting library that accepts an object that follows the PlottableProtocol can plot object that follow this protocol, and libraries that follow this protocol are compatible with plotters. The Protocol is runtime checkable, though as usual, that will only check for the presence of the needed methods at runtime, not for the static types.

Using the protocol:

Plotters should only depend on the methods and attributes listed below. In short, they are:

  • h.kind: The bh.Kind of the histogram (COUNT or MEAN)

  • h.values(): The value (as given by the kind)

  • h.variances(): The variance in the value (None if an unweighed histogram was filled with weights)

  • h.counts(): How many fills the bin received or the effective number of fills if the histogram is weighted

  • h.axes: A Sequence of axes

Axes have:

  • ax[i]: A tuple of (lower, upper) bin, or the discrete bin value (integer or sting)

  • len(ax): The number of bins

  • Iteration is supported

  • ax.traits.circular: True if circular

  • ax.traits.discrete: True if the bin represents a single value (e.g. Integer or Category axes) instead of an interval (e.g. Regular or Variable axes)

Plotters should see if .counts() is None; no boost-histogram objects currently return None, but a future storage or different library could.

Also check .variances; if not None, this storage holds variance information and error bars should be included. Boost-histogram histograms will return something unless they know that this is an invalid assumption (a weighted fill was made on an unweighted histogram).

To statically restrict yourself to valid API usage, use PlottableHistogram as the parameter type to your function (Not needed at runtime).

Implementing the protocol:

Add UHI to your MyPy environment; an example .pre-commit-config.yaml file:

- repo: https://github.com/pre-commit/mirrors-mypy
  rev: v0.812
  hooks:
  - id: mypy
    files: src
    additional_dependencies: [uhi, numpy~=1.20.1]

Then, check your library against the Protocol like this:

from typing import TYPE_CHECKING, cast

if TYPE_CHECKING:
    _: PlottableHistogram = cast(MyHistogram, None)

Help for plotters

The module uhi.numpy_plottable has a utility to simplify the common use case of accepting a PlottableProtocol or other common formats, primarily a NumPy histogram/histogram2d/histogramdd tuple. The ensure_plottable_histogram function will take a histogram or NumPy tuple, or an object that implements .to_numpy() or .numpy() and convert it to a NumPyPlottableHistogram, which is a minimal implementation of the Protocol. By calling this function on your input, you can then write your plotting function knowing that you always have a PlottableProtocol object, greatly simplifying your code.

The full protocol version 1.2 follows:

(Also available as uhi.typing.plottable.PlottableProtocol, for use in tests, etc.

"""
Using the protocol:

Producers: use isinstance(myhist, PlottableHistogram) in your tests; part of
the protocol is checkable at runtime, though ideally you should use MyPy; if
your histogram class supports PlottableHistogram, this will pass.

Consumers: Make your functions accept the PlottableHistogram static type, and
MyPy will force you to only use items in the Protocol.
"""

import sys
from typing import Any, Iterator, Optional, Sequence, Tuple, TypeVar, Union

# NumPy 1.20+ will work much, much better than previous versions when type checking
import numpy as np

if sys.version_info < (3, 8):
    from typing_extensions import Protocol, runtime_checkable

else:
    from typing import Protocol, runtime_checkable


protocol_version = (1, 2)

# Known kinds of histograms. A Producer can add Kinds not defined here; a
# Consumer should check for known types if it matters. A simple plotter could
# just use .value and .variance if non-None and ignore .kind.
#
# Could have been Kind = Literal["COUNT", "MEAN"] - left as a generic string so
# it can be extendable.
Kind = str

# Implementations are highly encouraged to use the following construct:
# class Kind(str, enum.Enum):
#     COUNT = "COUNT"
#     MEAN = "MEAN"
# Then return and use Kind.COUNT or Kind.MEAN.


@runtime_checkable
class PlottableTraits(Protocol):
    @property
    def circular(self) -> bool:
        """
        True if the axis "wraps around"
        """

    @property
    def discrete(self) -> bool:
        """
        True if each bin is discrete - Integer, Boolean, or Category, for example
        """


T = TypeVar("T", covariant=True)


@runtime_checkable
class PlottableAxisGeneric(Protocol[T]):
    # name: str - Optional, not part of Protocol
    # label: str - Optional, not part of Protocol
    #
    # Plotters are encouraged to plot label if it exists and is not None, and
    # name otherwise if it exists and is not None, but these properties are not
    # available on all histograms and not part of the Protocol.

    @property
    def traits(self) -> PlottableTraits:
        ...

    def __getitem__(self, index: int) -> T:
        """
        Get the pair of edges (not discrete) or bin label (discrete).
        """

    def __len__(self) -> int:
        """
        Return the number of bins (not counting flow bins, which are ignored
        for this Protocol currently).
        """

    def __eq__(self, other: Any) -> bool:
        """
        Required to be sequence-like.
        """

    def __iter__(self) -> Iterator[T]:
        """
        Useful element of a Sequence to include.
        """


PlottableAxisContinuous = PlottableAxisGeneric[Tuple[float, float]]
PlottableAxisInt = PlottableAxisGeneric[int]
PlottableAxisStr = PlottableAxisGeneric[str]

PlottableAxis = Union[PlottableAxisContinuous, PlottableAxisInt, PlottableAxisStr]


@runtime_checkable
class PlottableHistogram(Protocol):
    @property
    def axes(self) -> Sequence[PlottableAxis]:
        ...

    @property
    def kind(self) -> Kind:
        ...

    # All methods can have a flow=False argument - not part of this Protocol.
    # If this is included, it should return an array with flow bins added,
    # normal ordering.

    def values(self) -> np.ndarray:
        """
        Returns the accumulated values. The counts for simple histograms, the
        sum of weights for weighted histograms, the mean for profiles, etc.

        If counts is equal to 0, the value in that cell is undefined if
        kind == "MEAN".
        """

    def variances(self) -> Optional[np.ndarray]:
        """
        Returns the estimated variance of the accumulated values. The sum of squared
        weights for weighted histograms, the variance of samples for profiles, etc.
        For an unweighed histogram where kind == "COUNT", this should return the same
        as values if the histogram was not filled with weights, and None otherwise.

        If counts is equal to 1 or less, the variance in that cell is undefined if
        kind == "MEAN".

        If kind == "MEAN", the counts can be used to compute the error on the mean
        as sqrt(variances / counts), this works whether or not the entries are
        weighted if the weight variance was tracked by the implementation.
        """

    def counts(self) -> Optional[np.ndarray]:
        """
        Returns the number of entries in each bin for an unweighted
        histogram or profile and an effective number of entries (defined below)
        for a weighted histogram or profile. An exotic generalized histogram could
        have no sensible .counts, so this is Optional and should be checked by
        Consumers.

        If kind == "MEAN", counts (effective or not) can and should be used to
        determine whether the mean value and its variance should be displayed
        (see documentation of values and variances, respectively). The counts
        should also be used to compute the error on the mean (see documentation
        of variances).

        For a weighted histogram, counts is defined as sum_of_weights ** 2 /
        sum_of_weights_squared. It is equal or less than the number of times
        the bin was filled, the equality holds when all filled weights are equal.
        The larger the spread in weights, the smaller it is, but it is always 0
        if filled 0 times, and 1 if filled once, and more than 1 otherwise.

        A suggested implementation is:

            return np.divide(
                sum_of_weights**2,
                sum_of_weights_squared,
                out=np.zeros_like(sum_of_weights, dtype=np.float64),
                where=sum_of_weights_squared != 0)
        """