Working with numpy-like arrays#

NumPy-like arrays (often known as duck arrays) are drop-in replacements for the numpy.ndarray class but with different features, such as propagating physical units or a different layout in memory. Xarray can often wrap these array types, allowing you to use labelled dimensions and indexes whilst benefiting from the additional features of these array libraries.

Some numpy-like array types that xarray already has some support for:

Cupy - GPU support (see cupy-xarray),
Sparse - for performant arrays with many zero elements,
Pint - for tracking the physical units of your data (see pint-xarray),
Dask - parallel computing on larger-than-memory arrays (see using dask with xarray),
Cubed - another parallel computing framework that emphasises reliability (see cubed-xarray).

Warning

This feature should be considered somewhat experimental. Please report any bugs you find on xarray’s issue tracker.

Note

For information on wrapping dask arrays see Parallel computing with Dask. Whilst xarray wraps dask arrays in a similar way to that described on this page, chunked array types like dask.array.Array implement additional methods that require slightly different user code (e.g. calling .chunk or .compute). See the docs on wrapping chunked arrays.

Why “duck”?#

Why is it also called a “duck” array? This comes from a common statement of object-oriented programming - “If it walks like a duck, and quacks like a duck, treat it like a duck”. In other words, a library like xarray that is capable of using multiple different types of arrays does not have to explicitly check that each one it encounters is permitted (e.g. if dask, if numpy, if sparse etc.). Instead xarray can take the more permissive approach of simply treating the wrapped array as valid, attempting to call the relevant methods (e.g. .mean()) and only raising an error if a problem occurs (e.g. the method is not found on the wrapped class). This is much more flexible, and allows objects and classes from different libraries to work together more easily.

What is a numpy-like array?#

A “numpy-like array” (also known as a “duck array”) is a class that contains array-like data, and implements key numpy-like functionality such as indexing, broadcasting, and computation methods.

For example, the sparse library provides a sparse array type which is useful for representing nD array objects like sparse matrices in a memory-efficient manner. We can create a sparse array object (of the sparse.COO type) from a numpy array like this:

In [1]: from sparse import COO

In [2]: x = np.eye(4, dtype=np.uint8)  # create diagonal identity matrix

In [3]: s = COO.from_numpy(x)

In [4]: s
Out[4]: <COO: shape=(4, 4), dtype=uint8, nnz=4, fill_value=0>

This sparse object does not attempt to explicitly store every element in the array, only the non-zero elements. This approach is much more efficient for large arrays with only a few non-zero elements (such as tri-diagonal matrices). Sparse array objects can be converted back to a “dense” numpy array by calling sparse.COO.todense().

Just like numpy.ndarray objects, sparse.COO arrays support indexing

In [5]: s[1, 1]  # diagonal elements should be ones
Out[5]: 1

In [6]: s[2, 3]  # off-diagonal elements should be zero
Out[6]: 0

broadcasting,

In [7]: x2 = np.zeros(
   ...:     (4, 1), dtype=np.uint8
   ...: )  # create second sparse array of different shape
   ...: 

In [8]: s2 = COO.from_numpy(x2)

In [9]: (s * s2)  # multiplication requires broadcasting
Out[9]: <COO: shape=(4, 4), dtype=uint8, nnz=0, fill_value=0>

and various computation methods

In [10]: s.sum(axis=1)
Out[10]: <COO: shape=(4,), dtype=uint64, nnz=4, fill_value=0>

This numpy-like array also supports calling so-called numpy ufuncs (“universal functions”) on it directly:

In [11]: np.sum(s, axis=1)
Out[11]: <COO: shape=(4,), dtype=uint64, nnz=4, fill_value=0>

Notice that in each case the API for calling the operation on the sparse array is identical to that of calling it on the equivalent numpy array - this is the sense in which the sparse array is “numpy-like”.

Note

For discussion on exactly which methods a class needs to implement to be considered “numpy-like”, see Integrating with duck arrays.

Wrapping numpy-like arrays in xarray#

DataArray, Dataset, and Variable objects can wrap these numpy-like arrays.

Constructing xarray objects which wrap numpy-like arrays#

The primary way to create an xarray object which wraps a numpy-like array is to pass that numpy-like array instance directly to the constructor of the xarray class. The page on xarray data structures shows how DataArray and Dataset both accept data in various forms through their data argument, but in fact this data can also be any wrappable numpy-like array.

For example, we can wrap the sparse array we created earlier inside a new DataArray object:

In [12]: s_da = xr.DataArray(s, dims=["i", "j"])

In [13]: s_da
Out[13]: 
<xarray.DataArray (i: 4, j: 4)> Size: 68B
<COO: shape=(4, 4), dtype=uint8, nnz=4, fill_value=0>
Dimensions without coordinates: i, j

We can see what’s inside - the printable representation of our xarray object (the repr) automatically uses the printable representation of the underlying wrapped array.

Of course our sparse array object is still there underneath - it’s stored under the .data attribute of the dataarray:

In [14]: s_da.data
Out[14]: <COO: shape=(4, 4), dtype=uint8, nnz=4, fill_value=0>

Array methods#

We saw above that numpy-like arrays provide numpy methods. Xarray automatically uses these when you call the corresponding xarray method:

In [15]: s_da.sum(dim="j")
Out[15]: 
<xarray.DataArray (i: 4)> Size: 64B
<COO: shape=(4,), dtype=uint64, nnz=4, fill_value=0>
Dimensions without coordinates: i

Converting wrapped types#

If you want to change the type inside your xarray object you can use DataArray.as_numpy():

In [16]: s_da.as_numpy()
Out[16]: 
<xarray.DataArray (i: 4, j: 4)> Size: 16B
array([[1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 1]], dtype=uint8)
Dimensions without coordinates: i, j

This returns a new DataArray object, but now wrapping a normal numpy array.

If instead you want to convert to numpy and return that numpy array you can use either DataArray.to_numpy() or DataArray.values(), where the former is strongly preferred. The difference is in the way they coerce to numpy - values() always uses numpy.asarray() which will fail for some array types (e.g. cupy), whereas to_numpy() uses the correct method depending on the array type.

In [17]: s_da.to_numpy()
Out[17]: 
array([[1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 1]], dtype=uint8)

In [18]: s_da.values
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[18], line 1
----> 1 s_da.values

File ~/checkouts/readthedocs.org/user_builds/xray/checkouts/stable/xarray/core/dataarray.py:784, in DataArray.values(self)
    771 @property
    772 def values(self) -> np.ndarray:
    773     """
    774     The array's data converted to numpy.ndarray.
    775 
   (...)
    782     to this array may be reflected in the DataArray as well.
    783     """
--> 784     return self.variable.values

File ~/checkouts/readthedocs.org/user_builds/xray/checkouts/stable/xarray/core/variable.py:525, in Variable.values(self)
    522 @property
    523 def values(self):
    524     """The variable's data as a numpy.ndarray"""
--> 525     return _as_array_or_item(self._data)

File ~/checkouts/readthedocs.org/user_builds/xray/checkouts/stable/xarray/core/variable.py:323, in _as_array_or_item(data)
    309 def _as_array_or_item(data):
    310     """Return the given values as a numpy array, or as an individual item if
    311     it's a 0d datetime64 or timedelta64 array.
    312 
   (...)
    321     TODO: remove this (replace with np.asarray) once these issues are fixed
    322     """
--> 323     data = np.asarray(data)
    324     if data.ndim == 0:
    325         if data.dtype.kind == "M":

File ~/checkouts/readthedocs.org/user_builds/xray/conda/stable/lib/python3.10/site-packages/sparse/_sparse_array.py:265, in SparseArray.__array__(self, *args, **kwargs)
    262 from ._settings import AUTO_DENSIFY
    264 if not AUTO_DENSIFY:
--> 265     raise RuntimeError(
    266         "Cannot convert a sparse array to dense automatically. To manually densify, use the todense method."
    267     )
    269 return np.asarray(self.todense(), *args, **kwargs)

RuntimeError: Cannot convert a sparse array to dense automatically. To manually densify, use the todense method.

This illustrates the difference between data() and values(), which is sometimes a point of confusion for new xarray users. Explicitly: DataArray.data() returns the underlying numpy-like array, regardless of type, whereas DataArray.values() converts the underlying array to a numpy array before returning it. (This is another reason to use to_numpy() over values() - the intention is clearer.)

Conversion to numpy as a fallback#

If a wrapped array does not implement the corresponding array method then xarray will often attempt to convert the underlying array to a numpy array so that the operation can be performed. You may want to watch out for this behavior, and report any instances in which it causes problems.

Most of xarray’s API does support using duck array objects, but there are a few areas where the code will still convert to numpy arrays:

Dimension coordinates, and thus all indexing operations:
- Dataset.sel() and DataArray.sel()
- Dataset.loc() and DataArray.loc()
- Dataset.drop_sel() and DataArray.drop_sel()
- Dataset.reindex(), Dataset.reindex_like(), DataArray.reindex() and DataArray.reindex_like(): duck arrays in data variables and non-dimension coordinates won’t be casted
Functions and methods that depend on external libraries or features of numpy not covered by __array_function__ / __array_ufunc__:
- Dataset.ffill() and DataArray.ffill() (uses bottleneck)
- Dataset.bfill() and DataArray.bfill() (uses bottleneck)
- Dataset.interp(), Dataset.interp_like(), DataArray.interp() and DataArray.interp_like() (uses scipy): duck arrays in data variables and non-dimension coordinates will be casted in addition to not supporting duck arrays in dimension coordinates
- Dataset.rolling() and DataArray.rolling() (requires numpy>=1.20)
- Dataset.rolling_exp() and DataArray.rolling_exp() (uses numbagg)
- Dataset.interpolate_na() and DataArray.interpolate_na() (uses numpy.vectorize)
- apply_ufunc() with vectorize=True (uses numpy.vectorize)
Incompatibilities between different duck array libraries:
- Dataset.chunk() and DataArray.chunk(): this fails if the data was not already chunked and the duck array (e.g. a pint quantity) should wrap the new dask array; changing the chunk sizes works however.

Extensions using duck arrays#

Whilst the features above allow many numpy-like array libraries to be used pretty seamlessly with xarray, it often also makes sense to use an interfacing package to make certain tasks easier.

For example the pint-xarray package offers a custom .pint accessor (see Extending xarray using accessors) which provides convenient access to information stored within the wrapped array (e.g. .units and .magnitude), and makes makes creating wrapped pint arrays (and especially xarray-wrapping-pint-wrapping-dask arrays) simpler for the user.

We maintain a list of libraries extending xarray to make working with particular wrapped duck arrays easier. If you know of more that aren’t on this list please raise an issue to add them!

Working with numpy-like arrays

Contents

Working with numpy-like arrays#

Why “duck”?#

What is a numpy-like array?#

Wrapping numpy-like arrays in xarray#

Constructing xarray objects which wrap numpy-like arrays#

Array methods#

Converting wrapped types#

Conversion to numpy as a fallback#

Extensions using duck arrays#