Integrating with duck arrays#
Warning
This is an experimental feature. Please report any bugs or other difficulties on xarray’s issue tracker.
Xarray can wrap custom numpy-like arrays (”duck arrays”) - see the user guide documentation. This page is intended for developers who are interested in wrapping a new custom array type with xarray.
Duck array requirements#
Xarray does not explicitly check that required methods are defined by the underlying duck array object before attempting to wrap the given array. However, a wrapped array type should at a minimum define these attributes:
shape
property,dtype
property,ndim
property,__array__
method,__array_ufunc__
method,__array_function__
method.
These need to be defined consistently with numpy.ndarray
, for example the array shape
property needs to obey numpy’s broadcasting rules
(see also the Python Array API standard’s explanation
of these same rules).
Python Array API standard support#
As an integration library xarray benefits greatly from the standardization of duck-array libraries’ APIs, and so is a big supporter of the Python Array API Standard.
We aim to support any array libraries that follow the Array API standard out-of-the-box. However, xarray does occasionally
call some numpy functions which are not (yet) part of the standard (e.g. xarray.DataArray.pad()
calls numpy.pad()
).
See xarray issue #7848 for a list of such functions. We can still support dispatching on these functions through
the array protocols above, it just means that if you exclusively implement the methods in the Python Array API standard
then some features in xarray will not work.
Custom inline reprs#
In certain situations (e.g. when printing the collapsed preview of
variables of a Dataset
), xarray will display the repr of a duck array
in a single line, truncating it to a certain number of characters. If that
would drop too much information, the duck array may define a
_repr_inline_
method that takes max_width
(number of characters) as an
argument
class MyDuckArray:
...
def _repr_inline_(self, max_width):
"""format to a single line with at most max_width characters"""
...
...
To avoid duplicated information, this method must omit information about the shape and
dtype. For example, the string representation of a dask
array or a
sparse
matrix would be:
In [1]: import dask.array as da
In [2]: import xarray as xr
In [3]: import sparse
In [4]: a = da.linspace(0, 1, 20, chunks=2)
In [5]: a
Out[5]: dask.array<linspace, shape=(20,), dtype=float64, chunksize=(2,), chunktype=numpy.ndarray>
In [6]: b = np.eye(10)
In [7]: b[[5, 7, 3, 0], [6, 8, 2, 9]] = 2
In [8]: b = sparse.COO.from_numpy(b)
In [9]: b
Out[9]: <COO: shape=(10, 10), dtype=float64, nnz=14, fill_value=0.0>
In [10]: xr.Dataset(dict(a=("x", a), b=(("y", "z"), b)))
Out[10]:
<xarray.Dataset> Size: 496B
Dimensions: (x: 20, y: 10, z: 10)
Dimensions without coordinates: x, y, z
Data variables:
a (x) float64 160B dask.array<chunksize=(2,), meta=np.ndarray>
b (y, z) float64 336B <COO: nnz=14, fill_value=0.0>