xarray.Dataset.map_blocks#
- Dataset.map_blocks(func, args=(), kwargs=None, template=None)[source]#
Apply a function to each block of this Dataset.
Warning
This method is experimental and its signature may change.
- Parameters
func (
callable()
) – User-provided function that accepts a Dataset as its first parameter. The function will receive a subset or ‘block’ of this Dataset (see below), corresponding to one chunk along each chunked dimension.func
will be executed asfunc(subset_dataset, *subset_args, **kwargs)
.This function must return either a single DataArray or a single Dataset.
This function cannot add a new chunked dimension.
args (sequence) – Passed to func after unpacking and subsetting any xarray objects by blocks. xarray objects in args must be aligned with obj, otherwise an error is raised.
kwargs (
Mapping
orNone
) – Passed verbatim to func after unpacking. xarray objects, if any, will not be subset to blocks. Passing dask collections in kwargs is not allowed.template (
DataArray
,Dataset
orNone
, optional) – xarray object representing the final result after compute is called. If not provided, the function will be first run on mocked-up data, that looks like this object but has sizes 0, to determine properties of the returned object such as dtype, variable names, attributes, new dimensions and new indexes (if any).template
must be provided if the function changes the size of existing dimensions. When provided,attrs
on variables in template are copied over to the result. Anyattrs
set byfunc
will be ignored.
- Returns
A single DataArray
orDataset with dask backend
,reassembled from the outputs
ofthe
function.
Notes
This function is designed for when
func
needs to manipulate a whole xarray object subset to each block. Each block is loaded into memory. In the more common case wherefunc
can work on numpy arrays, it is recommended to useapply_ufunc
.If none of the variables in this object is backed by dask arrays, calling this function is equivalent to calling
func(obj, *args, **kwargs)
.See also
dask.array.map_blocks, xarray.apply_ufunc, xarray.Dataset.map_blocks xarray.DataArray.map_blocks
- map_blocks
Advanced Tutorial on map_blocks with dask
Examples
Calculate an anomaly from climatology using
.groupby()
. Usingxr.map_blocks()
allows for parallel operations with knowledge ofxarray
, its indices, and its methods like.groupby()
.>>> def calculate_anomaly(da, groupby_type="time.month"): ... gb = da.groupby(groupby_type) ... clim = gb.mean(dim="time") ... return gb - clim ... >>> time = xr.cftime_range("1990-01", "1992-01", freq="ME") >>> month = xr.DataArray(time.month, coords={"time": time}, dims=["time"]) >>> np.random.seed(123) >>> array = xr.DataArray( ... np.random.rand(len(time)), ... dims=["time"], ... coords={"time": time, "month": month}, ... ).chunk() >>> ds = xr.Dataset({"a": array}) >>> ds.map_blocks(calculate_anomaly, template=ds).compute() <xarray.Dataset> Size: 576B Dimensions: (time: 24) Coordinates: * time (time) object 192B 1990-01-31 00:00:00 ... 1991-12-31 00:00:00 month (time) int64 192B 1 2 3 4 5 6 7 8 9 10 ... 3 4 5 6 7 8 9 10 11 12 Data variables: a (time) float64 192B 0.1289 0.1132 -0.0856 ... 0.1906 -0.05901
Note that one must explicitly use
args=[]
andkwargs={}
to pass arguments to the function being applied inxr.map_blocks()
:>>> ds.map_blocks( ... calculate_anomaly, ... kwargs={"groupby_type": "time.year"}, ... template=ds, ... ) <xarray.Dataset> Size: 576B Dimensions: (time: 24) Coordinates: * time (time) object 192B 1990-01-31 00:00:00 ... 1991-12-31 00:00:00 month (time) int64 192B dask.array<chunksize=(24,), meta=np.ndarray> Data variables: a (time) float64 192B dask.array<chunksize=(24,), meta=np.ndarray>