xarray.core.groupby.DatasetGroupBy.shuffle_to_chunks#
- DatasetGroupBy.shuffle_to_chunks(chunks=None)[source]#
Sort or “shuffle” the underlying object.
“Shuffle” means the object is sorted so that all group members occur sequentially, in the same chunk. Multiple groups may occur in the same chunk. This method is particularly useful for chunked arrays (e.g. dask, cubed). particularly when you need to map a function that requires all members of a group to be present in a single chunk. For chunked array types, the order of appearance is not guaranteed, but will depend on the input chunking.
- Parameters
chunks (
int
,tuple
ofint
,"auto"
or mapping of hashable toint
ortuple
ofint
, optional) – How to adjust chunks along dimensions not present in the array being grouped by.- Returns
Examples
>>> import dask.array >>> da = xr.DataArray( ... dims="x", ... data=dask.array.arange(10, chunks=3), ... coords={"x": [1, 2, 3, 1, 2, 3, 1, 2, 3, 0]}, ... name="a", ... ) >>> shuffled = da.groupby("x").shuffle_to_chunks() >>> shuffled <xarray.DataArray 'a' (x: 10)> Size: 80B dask.array<shuffle, shape=(10,), dtype=int64, chunksize=(3,), chunktype=numpy.ndarray> Coordinates: * x (x) int64 80B 0 1 1 1 2 2 2 3 3 3
>>> shuffled.groupby("x").quantile(q=0.5).compute() <xarray.DataArray 'a' (x: 4)> Size: 32B array([9., 3., 4., 5.]) Coordinates: quantile float64 8B 0.5 * x (x) int64 32B 0 1 2 3