Plotting¶
This is a description of the PlottableProtocol
. Any plotting library that
accepts an object that follows the PlottableProtocol
can plot object that
follow this protocol, and libraries that follow this protocol are compatible
with plotters. The Protocol is runtime checkable, though as usual, that will
only check for the presence of the needed methods at runtime, not for the
static types.
Using the protocol:¶
Plotters should only depend on the methods and attributes listed below. In short, they are:
h.kind
: Thebh.Kind
of the histogram (COUNT or MEAN)h.values()
: The value (as given by the kind)h.variances()
: The variance in the value (None if an unweighed histogram was filled with weights)h.counts()
: How many fills the bin received or the effective number of fills if the histogram is weightedh.axes
: A Sequence of axes
Axes have:
ax[i]
: A tuple of (lower, upper) bin, or the discrete bin value (integer or string)len(ax)
: The number of binsIteration is supported
ax.traits.circular
: True if circularax.traits.discrete
: True if the bin represents a single value (e.g. Integer or Category axes) instead of an interval (e.g. Regular or Variable axes)
Plotters should see if .counts()
is None; no boost-histogram objects currently
return None, but a future storage or different library could.
Also check .variances
; if not None, this storage holds variance information and
error bars should be included. Boost-histogram histograms will return something
unless they know that this is an invalid assumption (a weighted fill was made
on an unweighted histogram).
To statically restrict yourself to valid API usage, use PlottableHistogram
as the parameter type to your function (Not needed at runtime).
Implementing the protocol:¶
Add UHI to your MyPy environment; an example .pre-commit-config.yaml
file:
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.812
hooks:
- id: mypy
files: src
additional_dependencies: [uhi, numpy~=1.20.1]
Then, check your library against the Protocol like this:
from typing import TYPE_CHECKING, cast
if TYPE_CHECKING:
_: PlottableHistogram = cast(MyHistogram, None)
Help for plotters¶
The module uhi.numpy_plottable
has a utility to simplify the common use
case of accepting a PlottableProtocol or other common formats, primarily a
NumPy histogram
/histogram2d
/histogramdd
tuple. The
ensure_plottable_histogram
function will take a histogram or NumPy tuple,
or an object that implements .to_numpy()
or .numpy()
and convert it to a
NumPyPlottableHistogram
, which is a minimal implementation of the Protocol.
By calling this function on your input, you can then write your plotting
function knowing that you always have a PlottableProtocol
object, greatly
simplifying your code.
The full protocol version 1.2 follows:¶
(Also available as uhi.typing.plottable.PlottableProtocol
, for use in tests, etc.
"""
Using the protocol:
Producers: use isinstance(myhist, PlottableHistogram) in your tests; part of
the protocol is checkable at runtime, though ideally you should use MyPy; if
your histogram class supports PlottableHistogram, this will pass.
Consumers: Make your functions accept the PlottableHistogram static type, and
MyPy will force you to only use items in the Protocol.
"""
from __future__ import annotations
from collections.abc import Iterator, Sequence
from typing import Any, Protocol, Tuple, TypeVar, Union, runtime_checkable
# NumPy 1.20+ will work much, much better than previous versions when type checking
import numpy as np
protocol_version = (1, 2)
# Known kinds of histograms. A Producer can add Kinds not defined here; a
# Consumer should check for known types if it matters. A simple plotter could
# just use .value and .variance if non-None and ignore .kind.
#
# Could have been Kind = Literal["COUNT", "MEAN"] - left as a generic string so
# it can be extendable.
Kind = str
# Implementations are highly encouraged to use the following construct:
# class Kind(str, enum.Enum):
# COUNT = "COUNT"
# MEAN = "MEAN"
# Then return and use Kind.COUNT or Kind.MEAN.
@runtime_checkable
class PlottableTraits(Protocol):
@property
def circular(self) -> bool:
"""
True if the axis "wraps around"
"""
@property
def discrete(self) -> bool:
"""
True if each bin is discrete - Integer, Boolean, or Category, for example
"""
T_co = TypeVar("T_co", covariant=True)
@runtime_checkable
class PlottableAxisGeneric(Protocol[T_co]):
# name: str - Optional, not part of Protocol
# label: str - Optional, not part of Protocol
#
# Plotters are encouraged to plot label if it exists and is not None, and
# name otherwise if it exists and is not None, but these properties are not
# available on all histograms and not part of the Protocol.
@property
def traits(self) -> PlottableTraits: ...
def __getitem__(self, index: int) -> T_co:
"""
Get the pair of edges (not discrete) or bin label (discrete).
"""
def __len__(self) -> int:
"""
Return the number of bins (not counting flow bins, which are ignored
for this Protocol currently).
"""
def __eq__(self, other: Any) -> bool:
"""
Required to be sequence-like.
"""
def __iter__(self) -> Iterator[T_co]:
"""
Useful element of a Sequence to include.
"""
PlottableAxisContinuous = PlottableAxisGeneric[Tuple[float, float]]
PlottableAxisInt = PlottableAxisGeneric[int]
PlottableAxisStr = PlottableAxisGeneric[str]
PlottableAxis = Union[PlottableAxisContinuous, PlottableAxisInt, PlottableAxisStr]
@runtime_checkable
class PlottableHistogram(Protocol):
@property
def axes(self) -> Sequence[PlottableAxis]: ...
@property
def kind(self) -> Kind: ...
# All methods can have a flow=False argument - not part of this Protocol.
# If this is included, it should return an array with flow bins added,
# normal ordering.
def values(self) -> np.typing.NDArray[Any]:
"""
Returns the accumulated values. The counts for simple histograms, the
sum of weights for weighted histograms, the mean for profiles, etc.
If counts is equal to 0, the value in that cell is undefined if
kind == "MEAN".
"""
def variances(self) -> np.typing.NDArray[Any] | None:
"""
Returns the estimated variance of the accumulated values. The sum of squared
weights for weighted histograms, the variance of samples for profiles, etc.
For an unweighed histogram where kind == "COUNT", this should return the same
as values if the histogram was not filled with weights, and None otherwise.
If counts is equal to 1 or less, the variance in that cell is undefined if
kind == "MEAN".
If kind == "MEAN", the counts can be used to compute the error on the mean
as sqrt(variances / counts), this works whether or not the entries are
weighted if the weight variance was tracked by the implementation.
"""
def counts(self) -> np.typing.NDArray[Any] | None:
"""
Returns the number of entries in each bin for an unweighted
histogram or profile and an effective number of entries (defined below)
for a weighted histogram or profile. An exotic generalized histogram could
have no sensible .counts, so this is Optional and should be checked by
Consumers.
If kind == "MEAN", counts (effective or not) can and should be used to
determine whether the mean value and its variance should be displayed
(see documentation of values and variances, respectively). The counts
should also be used to compute the error on the mean (see documentation
of variances).
For a weighted histogram, counts is defined as sum_of_weights ** 2 /
sum_of_weights_squared. It is equal or less than the number of times
the bin was filled, the equality holds when all filled weights are equal.
The larger the spread in weights, the smaller it is, but it is always 0
if filled 0 times, and 1 if filled once, and more than 1 otherwise.
A suggested implementation is:
return np.divide(
sum_of_weights**2,
sum_of_weights_squared,
out=np.zeros_like(sum_of_weights, dtype=np.float64),
where=sum_of_weights_squared != 0)
"""