xarray.core.groupby.DatasetGroupBy.shuffle_to_chunks

xarray.core.groupby.DatasetGroupBy.shuffle_to_chunks#

DatasetGroupBy.shuffle_to_chunks(chunks=None)[source]#

Sort or “shuffle” the underlying object.

“Shuffle” means the object is sorted so that all group members occur sequentially, in the same chunk. Multiple groups may occur in the same chunk. This method is particularly useful for chunked arrays (e.g. dask, cubed). particularly when you need to map a function that requires all members of a group to be present in a single chunk. For chunked array types, the order of appearance is not guaranteed, but will depend on the input chunking.

Parameters

chunks (int, tuple of int, "auto" or mapping of hashable to int or tuple of int, optional) – How to adjust chunks along dimensions not present in the array being grouped by.

Returns

DataArrayGroupBy or DatasetGroupBy

Examples

>>> import dask.array
>>> da = xr.DataArray(
...     dims="x",
...     data=dask.array.arange(10, chunks=3),
...     coords={"x": [1, 2, 3, 1, 2, 3, 1, 2, 3, 0]},
...     name="a",
... )
>>> shuffled = da.groupby("x").shuffle_to_chunks()
>>> shuffled
<xarray.DataArray 'a' (x: 10)> Size: 80B
dask.array<shuffle, shape=(10,), dtype=int64, chunksize=(3,), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) int64 80B 0 1 1 1 2 2 2 3 3 3
>>> shuffled.groupby("x").quantile(q=0.5).compute()
<xarray.DataArray 'a' (x: 4)> Size: 32B
array([9., 3., 4., 5.])
Coordinates:
    quantile  float64 8B 0.5
  * x         (x) int64 32B 0 1 2 3