UHI: Unified Histogram Interface¶
UHI is a library that helps connect other Histogramming libraries. It is primarily indented to be a guide and static type check helper; you do not need an runtime dependency on UHI. It currently does so with the following components:
UHI Indexing, which describes a powerful indexing system for histograms, designed to extend standard Array indexing for Histogram operations.
UHI Indexing+ (referred to as UHI+ for short), which describes a set of extensions to the standard indexing that make it easier to use on the command line.
The PlottableProtocol, which describes the minimal and complete set of requirements for a source library to produce and a plotting library to consume to plot a histogram, including error bars.
Indexing¶
This is the design document for Unified Histogram Indexing (UHI). Much of the
original plan is now implemented in boost-histogram. Other histogramming
libraries can implement support for this as well, and the “tag” functors, like
sum
and loc
can be used between libraries.
Syntax¶
The following examples assume you have imported loc
, rebin
,
underflow
, and overflow
from boost-histogram or any other library that
implements UHI.
Access:¶
v = h[b] # Returns bin contents, indexed by bin number
v = h[loc(b)] # Returns the bin containing the value
v = h[loc(b) + 1] # Returns the bin above the one containing the value
v = h[underflow] # Underflow and overflow can be accessed with special tags
Slicing:¶
h == h[:] # Slice over everything
h2 = h[a:b] # Slice of histogram (includes flow bins)
h2 = h[:b] # Leaving out endpoints is okay
h2 = h[loc(v):] # Slices can be in data coordinates, too
h2 = h[::rebin(2)] # Modification operations (rebin)
h2 = h[a:b:rebin(2)] # Modifications can combine with slices
h2 = h[::sum] # Projection operations # (name may change)
h2 = h[a:b:sum] # Adding endpoints to projection operations
h2 = h[0:len:sum] # removes under or overflow from the calculation
h2 = h[v, a:b] # A single value v is like v:v+1:sum
h2 = h[a:b, ...] # Ellipsis work just like normal numpy
Setting¶
# Single values
h[b] = v # Returns bin contents, indexed by bin number
h[loc(b)] = v # Returns the bin containing the value
h[underflow] = v # Underflow and overflow can be accessed with special tags
h[...] = array(...) # Setting with an array or histogram sets the contents if the sizes match
# Overflow can optionally be included if endpoints are left out
# The number of dimensions for non-scalars should match (broadcasting works normally otherwise)
All of this generalizes to multiple dimensions. loc(v)
could return
categorical bins, but slicing on categories would (currently) not be
allowed. These all return histograms, so flow bins are always preserved
- the one exception is projection; since this removes an axis, the only
use for the slice edges is to be explicit on what part you are
interested for the projection. So an explicit (non-empty) slice here
will case the relevant flow bin to be excluded.
loc
, project
, and rebin
all live inside the histogramming
package (like boost-histogram), but are completely general and can be created by a
user using an explicit API (below). underflow
and overflow
also
follow a general API.
One drawback of the syntax listed above is that it is hard to select an action
to run on an axis or a few axes out of many. For this use case, you can pass a
dictionary to the index, and that has the syntax {axis:action}
. The actions
are slices, and follow the rules listed above. This looks like:
h[{0: slice(None, None, bh.rebin(2))}] # rebin axis 0 by two
h[{1: slice(0, bh.loc(3.5))}] # slice axis 1 from 0 to the data coordinate 3.5
h[{7: slice(0, 2, bh.rebin(4))}] # slice and rebin axis 7
If you don’t like manually building slices, you can use the Slicer()
utility
to recover the original slicing syntax inside the dict:
s = bh.tag.Slicer()
h[{0: s[::rebin(2)]}] # rebin axis 0 by two
h[{1: s[0:loc(3.5)]}] # slice axis 1 from 0 to the data coordinate 3.5
h[{7: s[0:2:rebin(4)]}] # slice and rebin axis 7
Invalid syntax:¶
h[1.0] # Floats are not allowed, just like numpy
h[::2] # Skipping is not (currently) supported
h[..., None] # None == np.newaxis is not supported
Reordering axes¶
It is not possible to reorder axis with this syntax; libraries are expected to
provide a .project(*axis: int)
method which provides a way to reorder, as well
as fast access to a small subset of a large histogram in a complementary way to
the above indexing.
Rejected proposals or proposals for future consideration, maybe hist
-only:¶
h2 = h[1.0j:2.5j + 1] # Adding a j suffix to a number could be used in place of ``loc(x)``
h2 = h[1.0] # Floats in place of ``loc(x)``: too easy to make a mistake
Examples¶
For a histogram, the slice should be thought of like this:
histogram[start:stop:action]
The start and stop can be either a bin number (following Python rules),
or a callable; the callable will get the axis being acted on and should
return an extended bin number (-1
and len(ax)
are flow bins). A
provided callable is bh.loc
, which converts from axis data
coordinates into bin number.
The final argument, action
, is special. A general API is being
worked on, but for now, bh.sum
will “project out” or “integrate
over” an axes, and bh.rebin(n)
will rebin by an integral factor.
Both work correctly with limits; bh.sum
will remove flow bins if
given a range. h[0:len:bh.sum]
will sum without the flow bins.
Here are a few examples that highlight the functionality of UHI:
Example 1:¶
You want to slice axis 0 from 0 to 20, axis 1 from .5 to 1.5 in data coordinates, axis 2 needs to have double size bins (rebin by 2), and axis 3 should be summed over. You have a 4D histogram.
Solution:
ans = h[:20, bh.loc(-.5):bh.loc(1.5), ::bh.rebin(2), ::bh.sum]
Example 2:¶
You want to set all bins above 4.0 in data coordinates to 0 on a 1D histogram.
Solution:
h[bh.loc(4.0):] = 0
You can set with an array, as well. The array can either be the same length as the range you give, or the same length as the range + under/overflows if the range is open ended (no limit given). For example:
h = bh.Histogram(bh.axis.Regular(10, 0, 1))
h[:] = np.ones(10) # underflow/overflow still 0
h[:] = np.ones(12) # underflow/overflow now set too
Note that for clarity, while basic NumPy broadcasting is supported, axis-adding broadcasting is not supported; you must set a 2D histogram with a 2D array or a scalar, not a 1D array.
Example 3:¶
You want to sum from -infinity to 2.4 in data coordinates in axis 1, leaving all other axes alone. You have an ND histogram, with N >= 2.
Solution:
ans = h[:, :bh.loc(2.4):bh.sum, ...]
Notice that last example could be hard to write if the axis number, 1 in
this case, was large or programmatically defined. In these cases, you
can pass a dictionary of {axis:slice}
into the indexing operation. A
shortcut to quickly generate slices is provided, as well:
ans = h[{1: slice(None,bh.loc(2.4),bh.sum)}]
# Identical:
s = bh.tag.Slicer()
ans = h[{1: s[:bh.loc(2.4):bh.sum]}]
Example 4:¶
You want the underflow bin of a 1D histogram.
Solution:
val = h1[bh.underflow]
Details¶
Implementation notes¶
loc, rebin, and sum are not unique tags, or special types, but rather APIs for classes. New versions of these could be added, and implementations could be shared among Histogram libraries. For clarity, the following code is written in Python 3.6+. Prototype here. Extra doc here.
Note that the API comes in two forms; the __call__
/__new__
operator
form is more powerful, slower, optional, and is currently not supported by
boost-histogram. A fully conforming UHI implementation must allow the tag form
without the operators.
Basic implementation example (WIP):
class loc:
"When used in the start or stop of a Histogram's slice, x is taken to be the position in data coordinates."
def __init__(self, value, offset):
self.value = value
self.offset = offset
# supporting __add__ and __sub__ also recommended
def __call__(self, axis):
return axis.index(self.value) + self.offset
# Other flags, such as callable functions, could be added and detected later.
# UHI will perform a maximum performance sum when python's sum is encountered
def underflow(axis):
return -1
def overflow(axis):
return len(axis)
class rebin:
"""
When used in the step of a Histogram's slice, rebin(n) combines bins,
scaling their widths by a factor of n. If the number of bins is not
divisible by n, the remainder is added to the overflow bin.
"""
def __init__(self, factor):
# Items with .factor are specially treated in boost-histogram,
# performing a high performance rebinning
self.factor = factor
# Optional and not used by boost-histogram
def __call__(self, binning, axis, counts):
factor = self.factor
if isinstance(binning, Regular):
indexes = (numpy.arange(0, binning.num, factor),)
num, remainder = divmod(binning.num, factor)
high, hasover = binning.high, binning.hasover
if binning.hasunder:
indexes[0][:] += 1
indexes = ([0],) + indexes
if remainder == 0:
if binning.hasover:
indexes = indexes + ([binning.num + int(binning.hasunder)],)
else:
high = binning.left(indexes[-1][-1])
hasover = True
binning = Regular(num, binning.low, high, hasunder=binning.hasunder, hasover=hasover)
counts = numpy.add.reduceat(counts, numpy.concatenate(indexes), axis=axis)
return binning, counts
else:
raise NotImplementedError(type(binning))
Indexing+¶
This is an extended version of UHI, called UHI+. This is not implemented in boost-histogram, but is implemented in Hist.
Syntax extensions¶
UHI+ avoids using the standard tags found in UHI by using more advanced Python syntax.
Location based slicing/access: numeric axes¶
You can replace location based indexing loc(1.23) → 1.23j
(a “j” suffix on a number literal). You can shift by an integer, just like with loc: 2.3j + 1
will be one bin past the one containing the location “2.3”.
v = h[2j] # Returns the bin containing "2.0"
v = h[2j + 1] # Returns the bin above the one containing "2.0"
h2 = h[2j:] # Slices starting with the bin containing "2.0"
Location based slicing/access: string axis¶
If you have a string based axis, you can use a string directly loc("label") → "label"
.
v = h["a"] # Returns the "a" bin (string category axis)
Rebinning¶
You can replace rebin(2) → 2j
in the third slot of a slice.
h2 = h[::2j] # Modification operations (rebin)
h2 = h[a:b:2j] # Modifications can combine with slices
Named based indexing¶
An optional extension to indexing is expected for histogram implementations that support names. If named axes are supported, any expression that refers to an axis by an integer can also refer to it by a name string. .project(*axis: int | str)
is probably the most common place to see this, but you can also use strings in the UHI dict access, such as:
s = bh.tag.Slicer()
h[{"a": s[::2j]}] # rebin axis "a" by two
h[{"x": s[0:3.5j]}] # slice axis "x" from 0 to the data coordinate 3.5
h[{"other": s[0:2:4j]}] # slice and rebin axis "other"
Plotting¶
This is a description of the PlottableProtocol
. Any plotting library that
accepts an object that follows the PlottableProtocol
can plot object that
follow this protocol, and libraries that follow this protocol are compatible
with plotters. The Protocol is runtime checkable, though as usual, that will
only check for the presence of the needed methods at runtime, not for the
static types.
Using the protocol:¶
Plotters should only depend on the methods and attributes listed below. In short, they are:
h.kind
: Thebh.Kind
of the histogram (COUNT or MEAN)h.values()
: The value (as given by the kind)h.variances()
: The variance in the value (None if an unweighed histogram was filled with weights)h.counts()
: How many fills the bin received or the effective number of fills if the histogram is weightedh.axes
: A Sequence of axes
Axes have:
ax[i]
: A tuple of (lower, upper) bin, or the discrete bin value (integer or string)len(ax)
: The number of binsIteration is supported
ax.traits.circular
: True if circularax.traits.discrete
: True if the bin represents a single value (e.g. Integer or Category axes) instead of an interval (e.g. Regular or Variable axes)
Plotters should see if .counts()
is None; no boost-histogram objects currently
return None, but a future storage or different library could.
Also check .variances
; if not None, this storage holds variance information and
error bars should be included. Boost-histogram histograms will return something
unless they know that this is an invalid assumption (a weighted fill was made
on an unweighted histogram).
To statically restrict yourself to valid API usage, use PlottableHistogram
as the parameter type to your function (Not needed at runtime).
Implementing the protocol:¶
Add UHI to your MyPy environment; an example .pre-commit-config.yaml
file:
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.812
hooks:
- id: mypy
files: src
additional_dependencies: [uhi, numpy~=1.20.1]
Then, check your library against the Protocol like this:
from typing import TYPE_CHECKING, cast
if TYPE_CHECKING:
_: PlottableHistogram = cast(MyHistogram, None)
Help for plotters¶
The module uhi.numpy_plottable
has a utility to simplify the common use
case of accepting a PlottableProtocol or other common formats, primarily a
NumPy histogram
/histogram2d
/histogramdd
tuple. The
ensure_plottable_histogram
function will take a histogram or NumPy tuple,
or an object that implements .to_numpy()
or .numpy()
and convert it to a
NumPyPlottableHistogram
, which is a minimal implementation of the Protocol.
By calling this function on your input, you can then write your plotting
function knowing that you always have a PlottableProtocol
object, greatly
simplifying your code.
The full protocol version 1.2 follows:¶
(Also available as uhi.typing.plottable.PlottableProtocol
, for use in tests, etc.
"""
Using the protocol:
Producers: use isinstance(myhist, PlottableHistogram) in your tests; part of
the protocol is checkable at runtime, though ideally you should use MyPy; if
your histogram class supports PlottableHistogram, this will pass.
Consumers: Make your functions accept the PlottableHistogram static type, and
MyPy will force you to only use items in the Protocol.
"""
from __future__ import annotations
import sys
from collections.abc import Iterator, Sequence
from typing import Any, Tuple, TypeVar, Union
# NumPy 1.20+ will work much, much better than previous versions when type checking
import numpy as np
if sys.version_info < (3, 8):
from typing_extensions import Protocol, runtime_checkable
else:
from typing import Protocol, runtime_checkable
protocol_version = (1, 2)
# Known kinds of histograms. A Producer can add Kinds not defined here; a
# Consumer should check for known types if it matters. A simple plotter could
# just use .value and .variance if non-None and ignore .kind.
#
# Could have been Kind = Literal["COUNT", "MEAN"] - left as a generic string so
# it can be extendable.
Kind = str
# Implementations are highly encouraged to use the following construct:
# class Kind(str, enum.Enum):
# COUNT = "COUNT"
# MEAN = "MEAN"
# Then return and use Kind.COUNT or Kind.MEAN.
@runtime_checkable
class PlottableTraits(Protocol):
@property
def circular(self) -> bool:
"""
True if the axis "wraps around"
"""
@property
def discrete(self) -> bool:
"""
True if each bin is discrete - Integer, Boolean, or Category, for example
"""
T_co = TypeVar("T_co", covariant=True)
@runtime_checkable
class PlottableAxisGeneric(Protocol[T_co]):
# name: str - Optional, not part of Protocol
# label: str - Optional, not part of Protocol
#
# Plotters are encouraged to plot label if it exists and is not None, and
# name otherwise if it exists and is not None, but these properties are not
# available on all histograms and not part of the Protocol.
@property
def traits(self) -> PlottableTraits: ...
def __getitem__(self, index: int) -> T_co:
"""
Get the pair of edges (not discrete) or bin label (discrete).
"""
def __len__(self) -> int:
"""
Return the number of bins (not counting flow bins, which are ignored
for this Protocol currently).
"""
def __eq__(self, other: Any) -> bool:
"""
Required to be sequence-like.
"""
def __iter__(self) -> Iterator[T_co]:
"""
Useful element of a Sequence to include.
"""
PlottableAxisContinuous = PlottableAxisGeneric[Tuple[float, float]]
PlottableAxisInt = PlottableAxisGeneric[int]
PlottableAxisStr = PlottableAxisGeneric[str]
PlottableAxis = Union[PlottableAxisContinuous, PlottableAxisInt, PlottableAxisStr]
@runtime_checkable
class PlottableHistogram(Protocol):
@property
def axes(self) -> Sequence[PlottableAxis]: ...
@property
def kind(self) -> Kind: ...
# All methods can have a flow=False argument - not part of this Protocol.
# If this is included, it should return an array with flow bins added,
# normal ordering.
def values(self) -> np.typing.NDArray[Any]:
"""
Returns the accumulated values. The counts for simple histograms, the
sum of weights for weighted histograms, the mean for profiles, etc.
If counts is equal to 0, the value in that cell is undefined if
kind == "MEAN".
"""
def variances(self) -> np.typing.NDArray[Any] | None:
"""
Returns the estimated variance of the accumulated values. The sum of squared
weights for weighted histograms, the variance of samples for profiles, etc.
For an unweighed histogram where kind == "COUNT", this should return the same
as values if the histogram was not filled with weights, and None otherwise.
If counts is equal to 1 or less, the variance in that cell is undefined if
kind == "MEAN".
If kind == "MEAN", the counts can be used to compute the error on the mean
as sqrt(variances / counts), this works whether or not the entries are
weighted if the weight variance was tracked by the implementation.
"""
def counts(self) -> np.typing.NDArray[Any] | None:
"""
Returns the number of entries in each bin for an unweighted
histogram or profile and an effective number of entries (defined below)
for a weighted histogram or profile. An exotic generalized histogram could
have no sensible .counts, so this is Optional and should be checked by
Consumers.
If kind == "MEAN", counts (effective or not) can and should be used to
determine whether the mean value and its variance should be displayed
(see documentation of values and variances, respectively). The counts
should also be used to compute the error on the mean (see documentation
of variances).
For a weighted histogram, counts is defined as sum_of_weights ** 2 /
sum_of_weights_squared. It is equal or less than the number of times
the bin was filled, the equality holds when all filled weights are equal.
The larger the spread in weights, the smaller it is, but it is always 0
if filled 0 times, and 1 if filled once, and more than 1 otherwise.
A suggested implementation is:
return np.divide(
sum_of_weights**2,
sum_of_weights_squared,
out=np.zeros_like(sum_of_weights, dtype=np.float64),
where=sum_of_weights_squared != 0)
"""
Serialization¶
Warning
Serialization is in draft currently. Once at least one implementation is ready, we will remove this warning and release UHI 0.5.
Introduction¶
Histogram serialization has to cover a wide range of formats. As such, we describe a form for serialization that covers the metadata structure as JSON-like, with a provided JSON-schema. The data (bins and/or variable edges) is stored out-of-band in a binary format based on what type of data file you are in. For very small (primarily 1D) histograms, data is allowed inline as well.
The following formats are being targeted:
┌────────┐ ┌────────┐ ┌───────┐
│ ROOT │ │ HDF5 │ │ ZIP │
└────────┘ └────────┘ └───────┘
Other formats can be used as well, assuming they support out-of-band data and text attributes or files for the metadata.
Caveats¶
This structure was based heavily on boost-histogram, but it is intended to be general, and can be expanded in the future as needed. As such, the following limitations are required:
Serialization followed by deserialisation may cause axis changes. Axis types may change to an equivalent but less performant axis, growth status will be lost, etc.
Metadata must be expressible as JSON. It should also be reasonably sized; some formats like HDF5 may limit the size of attributes to 64K.
Floating point errors could be incurred on conversion, as the storage format uses a stable but different representation.
Axis
name
is only part of the metadata, and is not standardized. This is due to lack of support from boost-histogram.
Design¶
The following axes types are supported:
"regular"
: A regularly spaced set of even bins. Boost-histogram’s “integer” axes maps to this axis as well. Hasupper
,lower
,bins
,underflow
,overflow
, andcircular
properties.circular
defaults to False if not present."variable"
: A continuous axis defined by bins+1 edges. Hasedges
, which is either an in-line list of numbers or a string pointing to an out-of-band data source. Also hasunderflow
,overflow
, andcircular
properties.circular
defaults to False if not present."category_int"
: A list of integer bins, non-continuous. Hascategories
, which is an in-line list of integers. Also hasflow
."category_str"
: A list of string bins. Hascategories
, which is an in-line list of strings. Also hasflow
."boolean"
: A true/false axis.
Axes with gaps are currently not supported.
All axes support metadata
, a string-valued dictionary of arbitrary, JSON-like data.
The following storages are supported:
"int"
: A collection of integers. Boost-histogram’s Int64 and AtomicInt64 map to this, and sometimes Unlimited."double"
: A collection of 64-bit floating point values. Boost-histogram’s Double storage maps to this, and sometimes Unlimited."weighted"
: A collection of two arrays of 64-bit floating point values,"value"
and"variance"
. Boost-histogram’s Weight storage maps to this."mean"
: A collection of three arrays of 64-bit floating point values, “count”, “value”, and “variance”. Boost-histogram’s Mean storage maps to this."weighted_mean"
: A collection of four arrays of 64-bit floating point values,"sum_of_weights"
,"sum_of_weights_squared"
,"values"
, and"variances"
. Boost-histogram’s WeighedMean storage maps to this.
CLI/API¶
You can currently test a JSON file against the schema by running:
$ python -m uhi.schema some/file.json
Or with code:
import uhi.schema
uhi.schema.validate("some/file.json")
Eventually this should also be usable for JSON’s inside zip, HDF5 attributes, and maybe more.
Warning
Currently, this spec describes how to prepare the metadata for one of the
targeted backends. It does not yet cover backend specific details, like how to
define and use the binary resource locator strings or how to store the data.
JSON is not a target spec, but just part of the ZIP spec, meaning the files
that currently “pass” the tool above would be valid inside a .zip
file
eventually, but are not valid by themselves.
Rendered schema¶
Histogram¶
https://raw.githubusercontent.com/scikit-hep/uhi/main/src/uhi/resources/histogram.schema.json |
||||||
type |
object |
|||||
patternProperties |
||||||
|
type |
object |
||||
properties |
||||||
|
Arbitrary metadata dictionary. |
|||||
type |
object |
|||||
|
A list of the axes of the histogram. |
|||||
type |
array |
|||||
items |
oneOf |
:ref: |
||||
:ref: |
||||||
:ref: |
||||||
:ref: |
||||||
:ref: |
||||||
|
The storage of the bins of the histogram. |
|||||
oneOf |
:ref: |
|||||
:ref: |
||||||
:ref: |
||||||
:ref: |
||||||
:ref: |
||||||
additionalProperties |
False |
|||||
additionalProperties |
False |
|||||
$defs |
||||||
|
An evenly spaced set of continuous bins. |
|||||
type |
object |
|||||
properties |
||||||
|
type |
string |
||||
const |
regular |
|||||
|
Lower edge of the axis. |
|||||
type |
number |
|||||
|
Upper edge of the axis. |
|||||
type |
number |
|||||
|
Number of bins in the axis. |
|||||
type |
integer |
|||||
minimum |
0 |
|||||
|
True if there is a bin for underflow. |
|||||
type |
boolean |
|||||
|
True if there is a bin for overflow. |
|||||
type |
boolean |
|||||
|
True if the axis wraps around. |
|||||
type |
boolean |
|||||
|
Arbitrary metadata dictionary. |
|||||
type |
object |
|||||
additionalProperties |
False |
|||||
|
A variably spaced set of continuous bins. |
|||||
type |
object |
|||||
properties |
||||||
|
type |
string |
||||
const |
variable |
|||||
|
oneOf |
type |
array |
|||
items |
type |
number |
||||
A path (URI?) to the edges data. |
||||||
type |
string |
|||||
|
type |
boolean |
||||
|
type |
boolean |
||||
|
type |
boolean |
||||
|
Arbitrary metadata dictionary. |
|||||
type |
object |
|||||
additionalProperties |
False |
|||||
|
A set of string categorical bins. |
|||||
type |
object |
|||||
properties |
||||||
|
type |
string |
||||
const |
category_str |
|||||
|
type |
array |
||||
items |
type |
string |
||||
uniqueItems |
True |
|||||
|
True if flow bin (at the overflow position) present. |
|||||
type |
boolean |
|||||
|
Arbitrary metadata dictionary. |
|||||
type |
object |
|||||
additionalProperties |
False |
|||||
|
A set of integer categorical bins in any order. |
|||||
type |
object |
|||||
properties |
||||||
|
type |
string |
||||
const |
category_int |
|||||
|
type |
array |
||||
items |
type |
integer |
||||
uniqueItems |
True |
|||||
|
True if flow bin (at the overflow position) present. |
|||||
type |
boolean |
|||||
|
Arbitrary metadata dictionary. |
|||||
type |
object |
|||||
additionalProperties |
False |
|||||
|
A simple true/false axis with no flow. |
|||||
type |
object |
|||||
properties |
||||||
|
type |
string |
||||
const |
boolean |
|||||
|
Arbitrary metadata dictionary. |
|||||
type |
object |
|||||
additionalProperties |
False |
|||||
|
A storage holding integer counts. |
|||||
type |
object |
|||||
properties |
||||||
|
type |
string |
||||
const |
int |
|||||
|
oneOf |
A path (URI?) to the integer bin data. |
||||
type |
string |
|||||
type |
array |
|||||
items |
type |
integer |
||||
additionalProperties |
False |
|||||
|
A storage holding floating point counts. |
|||||
type |
object |
|||||
properties |
||||||
|
type |
string |
||||
const |
double |
|||||
|
oneOf |
A path (URI?) to the floating point bin data. |
||||
type |
string |
|||||
type |
array |
|||||
items |
type |
number |
||||
additionalProperties |
False |
|||||
|
A storage holding floating point counts and variances. |
|||||
type |
object |
|||||
properties |
||||||
|
type |
string |
||||
const |
int |
|||||
|
oneOf |
A path (URI?) to the floating point bin data; outer dimension is [value, variance] |
||||
type |
string |
|||||
type |
object |
|||||
properties |
||||||
|
type |
array |
||||
items |
type |
number |
||||
|
type |
array |
||||
items |
type |
number |
||||
additionalProperties |
False |
|||||
additionalProperties |
False |
|||||
|
A storage holding ‘profile’-style floating point counts, values, and variances. |
|||||
type |
object |
|||||
properties |
||||||
|
type |
string |
||||
const |
int |
|||||
|
oneOf |
A path (URI?) to the floating point bin data; outer dimension is [counts, value, variance] |
||||
type |
string |
|||||
type |
object |
|||||
properties |
||||||
|
type |
array |
||||
items |
type |
number |
||||
|
type |
array |
||||
items |
type |
number |
||||
|
type |
array |
||||
items |
type |
number |
||||
additionalProperties |
False |
|||||
additionalProperties |
False |
|||||
|
A storage holding ‘profile’-style floating point ∑weights, ∑weights², values, and variances. |
|||||
type |
object |
|||||
properties |
||||||
|
type |
string |
||||
const |
int |
|||||
|
oneOf |
A path (URI?) to the floating point bin data; outer dimension is [∑weights, ∑weights², value, variance] |
||||
type |
string |
|||||
type |
object |
|||||
properties |
||||||
|
type |
array |
||||
items |
type |
number |
||||
|
type |
array |
||||
items |
type |
number |
||||
|
type |
array |
||||
items |
type |
number |
||||
|
type |
array |
||||
items |
type |
number |
||||
additionalProperties |
False |
|||||
additionalProperties |
False |
Full schema¶
The full schema is below:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://raw.githubusercontent.com/scikit-hep/uhi/main/src/uhi/resources/histogram.schema.json",
"title": "Histogram",
"type": "object",
"additionalProperties": false,
"patternProperties": {
".+": {
"type": "object",
"required": ["axes", "storage"],
"additionalProperties": false,
"properties": {
"metadata": {
"type": "object",
"description": "Arbitrary metadata dictionary."
},
"axes": {
"type": "array",
"description": "A list of the axes of the histogram.",
"items": {
"oneOf": [
{ "$ref": "#/$defs/regular_axis" },
{ "$ref": "#/$defs/variable_axis" },
{ "$ref": "#/$defs/category_str_axis" },
{ "$ref": "#/$defs/category_int_axis" },
{ "$ref": "#/$defs/boolean_axis" }
]
}
},
"storage": {
"description": "The storage of the bins of the histogram.",
"oneOf": [
{ "$ref": "#/$defs/int_storage" },
{ "$ref": "#/$defs/double_storage" },
{ "$ref": "#/$defs/weighted_storage" },
{ "$ref": "#/$defs/mean_storage" },
{ "$ref": "#/$defs/weighted_mean_storage" }
]
}
}
}
},
"$defs": {
"regular_axis": {
"type": "object",
"description": "An evenly spaced set of continuous bins.",
"required": ["type", "lower", "upper", "bins", "underflow", "overflow"],
"additionalProperties": false,
"properties": {
"type": { "type": "string", "const": "regular" },
"lower": { "type": "number", "description": "Lower edge of the axis." },
"upper": { "type": "number", "description": "Upper edge of the axis." },
"bins": {
"type": "integer",
"minimum": 0,
"description": "Number of bins in the axis."
},
"underflow": {
"type": "boolean",
"description": "True if there is a bin for underflow."
},
"overflow": {
"type": "boolean",
"description": "True if there is a bin for overflow."
},
"circular": {
"type": "boolean",
"description": "True if the axis wraps around."
},
"metadata": {
"type": "object",
"description": "Arbitrary metadata dictionary."
}
}
},
"variable_axis": {
"type": "object",
"description": "A variably spaced set of continuous bins.",
"required": ["type", "edges", "underflow", "overflow"],
"additionalProperties": false,
"properties": {
"type": { "type": "string", "const": "variable" },
"edges": {
"oneOf": [
{
"type": "array",
"items": { "type": "number", "minItems": 2, "uniqueItems": true }
},
{
"type": "string",
"description": "A path (URI?) to the edges data."
}
]
},
"underflow": { "type": "boolean" },
"overflow": { "type": "boolean" },
"circular": { "type": "boolean" },
"metadata": {
"type": "object",
"description": "Arbitrary metadata dictionary."
}
}
},
"category_str_axis": {
"type": "object",
"description": "A set of string categorical bins.",
"required": ["type", "categories", "flow"],
"additionalProperties": false,
"properties": {
"type": { "type": "string", "const": "category_str" },
"categories": {
"type": "array",
"items": { "type": "string" },
"uniqueItems": true
},
"flow": {
"type": "boolean",
"description": "True if flow bin (at the overflow position) present."
},
"metadata": {
"type": "object",
"description": "Arbitrary metadata dictionary."
}
}
},
"category_int_axis": {
"type": "object",
"description": "A set of integer categorical bins in any order.",
"required": ["type", "categories", "flow"],
"additionalProperties": false,
"properties": {
"type": { "type": "string", "const": "category_int" },
"categories": {
"type": "array",
"items": { "type": "integer" },
"uniqueItems": true
},
"flow": {
"type": "boolean",
"description": "True if flow bin (at the overflow position) present."
},
"metadata": {
"type": "object",
"description": "Arbitrary metadata dictionary."
}
}
},
"boolean_axis": {
"type": "object",
"description": "A simple true/false axis with no flow.",
"required": ["type"],
"additionalProperties": false,
"properties": {
"type": { "type": "string", "const": "boolean" },
"metadata": {
"type": "object",
"description": "Arbitrary metadata dictionary."
}
}
},
"int_storage": {
"type": "object",
"description": "A storage holding integer counts.",
"required": ["type", "data"],
"additionalProperties": false,
"properties": {
"type": { "type": "string", "const": "int" },
"data": {
"oneOf": [
{
"type": "string",
"description": "A path (URI?) to the integer bin data."
},
{ "type": "array", "items": { "type": "integer" } }
]
}
}
},
"double_storage": {
"type": "object",
"description": "A storage holding floating point counts.",
"required": ["type", "data"],
"additionalProperties": false,
"properties": {
"type": { "type": "string", "const": "double" },
"data": {
"oneOf": [
{
"type": "string",
"description": "A path (URI?) to the floating point bin data."
},
{ "type": "array", "items": { "type": "number" } }
]
}
}
},
"weighted_storage": {
"type": "object",
"description": "A storage holding floating point counts and variances.",
"required": ["type", "data"],
"additionalProperties": false,
"properties": {
"type": { "type": "string", "const": "int" },
"data": {
"oneOf": [
{
"type": "string",
"description": "A path (URI?) to the floating point bin data; outer dimension is [value, variance]"
},
{
"type": "object",
"required": ["values", "variances"],
"additionalProperties": false,
"properties": {
"values": { "type": "array", "items": { "type": "number" } },
"variances": { "type": "array", "items": { "type": "number" } }
}
}
]
}
}
},
"mean_storage": {
"type": "object",
"description": "A storage holding 'profile'-style floating point counts, values, and variances.",
"required": ["type", "data"],
"additionalProperties": false,
"properties": {
"type": { "type": "string", "const": "int" },
"data": {
"oneOf": [
{
"type": "string",
"description": "A path (URI?) to the floating point bin data; outer dimension is [counts, value, variance]"
},
{
"type": "object",
"required": ["counts", "values", "variances"],
"additionalProperties": false,
"properties": {
"counts": { "type": "array", "items": { "type": "number" } },
"values": { "type": "array", "items": { "type": "number" } },
"variances": { "type": "array", "items": { "type": "number" } }
}
}
]
}
}
},
"weighted_mean_storage": {
"type": "object",
"description": "A storage holding 'profile'-style floating point ∑weights, ∑weights², values, and variances.",
"required": ["type", "data"],
"additionalProperties": false,
"properties": {
"type": { "type": "string", "const": "int" },
"data": {
"oneOf": [
{
"type": "string",
"description": "A path (URI?) to the floating point bin data; outer dimension is [∑weights, ∑weights², value, variance]"
},
{
"type": "object",
"required": [
"sum_of_weights",
"sum_of_weights_squared",
"values",
"variances"
],
"additionalProperties": false,
"properties": {
"sum_of_weights": {
"type": "array",
"items": { "type": "number" }
},
"sum_of_weights_squared": {
"type": "array",
"items": { "type": "number" }
},
"values": { "type": "array", "items": { "type": "number" } },
"variances": { "type": "array", "items": { "type": "number" } }
}
}
]
}
}
}
}
}
Changelog¶
v0.4.0: Version 0.4.0¶
Released on 2023-10-17 - GitHub - PyPI
This release primarily drops Python 3.6 support. It also adds official 3.12 support. The changelog is now part of the docs.
- chore: drop Python 3.6 by @henryiii in #84
- chore: move to using Ruff by @henryiii in #86
- ci: fix readthedocs by @henryiii in #99
- chore: target-version no longer needed by Black or Ruff by @henryiii in #103
- chore: sp-repo-review by @henryiii in #107
- docs: prepare for schema addition by @henryiii in #113
- chore: add a check and bump NumPy by @henryiii in #114
- docs: add changelog by @henryiii in #115
Full Changelog: v0.3.3...v0.4.0
v0.3.3: Version 0.3.3¶
Released on 2023-01-04 - GitHub - PyPI
- ci: update to Python 3.11 final by @henryiii in #76
- chore: adapt to new versions by @henryiii in #82
- fix: use ABC for ROOTAxis by @henryiii and @pre-commit-ci in #79
- chore: use svn versioning by @henryiii in #83
Full Changelog: v0.3.2...v0.3.3
v0.3.2: Version 0.3.2¶
Released on 2022-09-20 - GitHub - PyPI
Minor release, mostly updating to indicate Python 3.11 support. Moved the backend to Hatchling from Flit.
- Fix punctuation by @klieret in #60
- chore: switch to hatchling by @henryiii in #63
- chore: include 3.11 by @henryiii in #72
New Contributors
Full Changelog: v0.3.1...v0.3.2
v0.3.1: Version 0.3.1¶
Released on 2022-01-06 - GitHub - PyPI
Officially supports Python 3.10. Build system moved to Flit, with a PDM option to replace the old Poetry system if users want to develop in a locked environment. Updated to mypy 0.930 and Numpy 1.22. Type ignores now list the error code(s).
v0.3.0: Version 0.3.0¶
Released on 2021-06-15 - GitHub - PyPI
The conversion utility now supports PyROOT histograms, thanks to @pieterdavid. Standard maintenance updates, including moving to mypy 0.902.
v0.2.1: Version 0.2.1¶
Released on 2021-03-18 - GitHub - PyPI
Small patch release to add missing test files #15 to the SDist, for downstream packagers like conda-forge. Nox support added for easy development & development tasks, like bumping the version.
v0.2.0: Version 0.2.0¶
Released on 2021-03-17 - GitHub - PyPI
Version 1.2 of the PlottableHistogram Protocol; allows iteration over an axis and requires the return types are np.ndarrays (#11). Adds a new runtime utility to simplify plotting libraries that want to use UHI at runtime (#13).
v0.1.2: Version 0.1.2¶
Released on 2021-03-09 - GitHub - PyPI
Fix an issue with PlottableProtocol requiring writable properties (#9). Corrected the internal version number to match the external one. Eased up the requirements upper bounds just a bit.
v0.1.1: Version 0.1.1¶
Released on 2021-01-29 - GitHub - PyPI
First release with correct PyPI landing page (Poetry requires a readme key).