Integrating with duck arrays#

Warning

This is an experimental feature. Please report any bugs or other difficulties on xarray’s issue tracker.

Xarray can wrap custom numpy-like arrays (”duck arrays”) - see the user guide documentation. This page is intended for developers who are interested in wrapping a new custom array type with xarray.

Duck array requirements#

Xarray does not explicitly check that required methods are defined by the underlying duck array object before attempting to wrap the given array. However, a wrapped array type should at a minimum define these attributes:

  • shape property,

  • dtype property,

  • ndim property,

  • __array__ method,

  • __array_ufunc__ method,

  • __array_function__ method.

These need to be defined consistently with numpy.ndarray, for example the array shape property needs to obey numpy’s broadcasting rules (see also the Python Array API standard’s explanation of these same rules).

Python Array API standard support#

As an integration library xarray benefits greatly from the standardization of duck-array libraries’ APIs, and so is a big supporter of the Python Array API Standard.

We aim to support any array libraries that follow the Array API standard out-of-the-box. However, xarray does occasionally call some numpy functions which are not (yet) part of the standard (e.g. xarray.DataArray.pad() calls numpy.pad()). See xarray issue #7848 for a list of such functions. We can still support dispatching on these functions through the array protocols above, it just means that if you exclusively implement the methods in the Python Array API standard then some features in xarray will not work.

Custom inline reprs#

In certain situations (e.g. when printing the collapsed preview of variables of a Dataset), xarray will display the repr of a duck array in a single line, truncating it to a certain number of characters. If that would drop too much information, the duck array may define a _repr_inline_ method that takes max_width (number of characters) as an argument

class MyDuckArray:
    ...

    def _repr_inline_(self, max_width):
        """format to a single line with at most max_width characters"""
        ...

    ...

To avoid duplicated information, this method must omit information about the shape and dtype. For example, the string representation of a dask array or a sparse matrix would be:

In [1]: import dask.array as da

In [2]: import xarray as xr

In [3]: import sparse

In [4]: a = da.linspace(0, 1, 20, chunks=2)

In [5]: a
Out[5]: dask.array<linspace, shape=(20,), dtype=float64, chunksize=(2,), chunktype=numpy.ndarray>

In [6]: b = np.eye(10)

In [7]: b[[5, 7, 3, 0], [6, 8, 2, 9]] = 2

In [8]: b = sparse.COO.from_numpy(b)

In [9]: b
Out[9]: <COO: shape=(10, 10), dtype=float64, nnz=14, fill_value=0.0>

In [10]: xr.Dataset(dict(a=("x", a), b=(("y", "z"), b)))
Out[10]: 
<xarray.Dataset> Size: 496B
Dimensions:  (x: 20, y: 10, z: 10)
Dimensions without coordinates: x, y, z
Data variables:
    a        (x) float64 160B dask.array<chunksize=(2,), meta=np.ndarray>
    b        (y, z) float64 336B <COO: nnz=14, fill_value=0.0>