xarray.apply_ufunc¶
-
xarray.
apply_ufunc
(func : Callable, *args : Any, input_core_dims : Optional[Sequence[Sequence]] = None, output_core_dims : Optional[Sequence[Sequence]] = ((), ), exclude_dims : Collection = frozenset(), vectorize : bool = False, join : str = 'exact', dataset_join : str = 'exact', dataset_fill_value : Any = _NO_FILL_VALUE, keep_attrs : bool = False, kwargs : Mapping = None, dask : str = 'forbidden', output_dtypes : Optional[Sequence] = None, output_sizes : Optional[Mapping[Any, int]] = None)¶ Apply a vectorized function for unlabeled arrays on xarray objects.
The function will be mapped over the data variable(s) of the input arguments using xarray’s standard rules for labeled computation, including alignment, broadcasting, looping over GroupBy/Dataset variables, and merging of coordinates.
Parameters: - func : callable
Function to call like
func(*args, **kwargs)
on unlabeled arrays (.data
) that returns an array or tuple of arrays. If multiple arguments with non-matching dimensions are supplied, this function is expected to vectorize (broadcast) over axes of positional arguments in the style of NumPy universal functions [1] (if this is not the case, setvectorize=True
). If this function returns multiple outputs, you must setoutput_core_dims
as well.- *args : Dataset, DataArray, GroupBy, Variable, numpy/dask arrays or scalars
Mix of labeled and/or unlabeled arrays to which to apply the function.
- input_core_dims : Sequence[Sequence], optional
List of the same length as
args
giving the list of core dimensions on each input argument that should not be broadcast. By default, we assume there are no core dimensions on any input arguments.For example,
input_core_dims=[[], ['time']]
indicates that all dimensions on the first argument and all dimensions other than ‘time’ on the second argument should be broadcast.Core dimensions are automatically moved to the last axes of input variables before applying
func
, which facilitates using NumPy style generalized ufuncs [2].- output_core_dims : List[tuple], optional
List of the same length as the number of output arguments from
func
, giving the list of core dimensions on each output that were not broadcast on the inputs. By default, we assume thatfunc
outputs exactly one array, with axes corresponding to each broadcast dimension.Core dimensions are assumed to appear as the last dimensions of each output in the provided order.
- exclude_dims : set, optional
Core dimensions on the inputs to exclude from alignment and broadcasting entirely. Any input coordinates along these dimensions will be dropped. Each excluded dimension must also appear in
input_core_dims
for at least one argument. Only dimensions listed here are allowed to change size between input and output objects.- vectorize : bool, optional
If True, then assume
func
only takes arrays defined over core dimensions as input and vectorize it automatically withnumpy.vectorize()
. This option exists for convenience, but is almost always slower than supplying a pre-vectorized function. Using this option requires NumPy version 1.12 or newer.- join : {‘outer’, ‘inner’, ‘left’, ‘right’, ‘exact’}, optional
Method for joining the indexes of the passed objects along each dimension, and the variables of Dataset objects with mismatched data variables:
- ‘outer’: use the union of object indexes
- ‘inner’: use the intersection of object indexes
- ‘left’: use indexes from the first object with each dimension
- ‘right’: use indexes from the last object with each dimension
- ‘exact’: raise ValueError instead of aligning when indexes to be aligned are not equal
- dataset_join : {‘outer’, ‘inner’, ‘left’, ‘right’, ‘exact’}, optional
Method for joining variables of Dataset objects with mismatched data variables.
- ‘outer’: take variables from both Dataset objects
- ‘inner’: take only overlapped variables
- ‘left’: take only variables from the first object
- ‘right’: take only variables from the last object
- ‘exact’: data variables on all Dataset objects must match exactly
- dataset_fill_value : optional
Value used in place of missing variables on Dataset inputs when the datasets do not share the exact same
data_vars
. Required ifdataset_join not in {'inner', 'exact'}
, otherwise ignored.- keep_attrs: boolean, Optional
Whether to copy attributes from the first argument to the output.
- kwargs: dict, optional
Optional keyword arguments passed directly on to call
func
.- dask: ‘forbidden’, ‘allowed’ or ‘parallelized’, optional
How to handle applying to objects containing lazy data in the form of dask arrays:
- ‘forbidden’ (default): raise an error if a dask array is encountered.
- ‘allowed’: pass dask arrays directly on to
func
. - ‘parallelized’: automatically parallelize
func
if any of the inputs are a dask array. If used, theoutput_dtypes
argument must also be provided. Multiple output arguments are not yet supported.
- output_dtypes : list of dtypes, optional
Optional list of output dtypes. Only used if dask=’parallelized’.
- output_sizes : dict, optional
Optional mapping from dimension names to sizes for outputs. Only used if dask=’parallelized’ and new dimensions (not found on inputs) appear on outputs.
Returns: - Single value or tuple of Dataset, DataArray, Variable, dask.array.Array or
- numpy.ndarray, the first type on that list to appear on an input.
References
[1] (1, 2) http://docs.scipy.org/doc/numpy/reference/ufuncs.html [2] (1, 2) http://docs.scipy.org/doc/numpy/reference/c-api.generalized-ufuncs.html [3] http://xarray.pydata.org/en/stable/computation.html#wrapping-custom-computation Examples
Calculate the vector magnitude of two arguments:
>>> def magnitude(a, b): ... func = lambda x, y: np.sqrt(x ** 2 + y ** 2) ... return xr.apply_ufunc(func, a, b)
You can now apply
magnitude()
toxr.DataArray
andxr.Dataset
objects, with automatically preserved dimensions and coordinates, e.g.,>>> array = xr.DataArray([1, 2, 3], coords=[('x', [0.1, 0.2, 0.3])]) >>> magnitude(array, -array) <xarray.DataArray (x: 3)> array([1.414214, 2.828427, 4.242641]) Coordinates: * x (x) float64 0.1 0.2 0.3
Plain scalars, numpy arrays and a mix of these with xarray objects is also supported:
>>> magnitude(4, 5) 5.0 >>> magnitude(3, np.array([0, 4])) array([3., 5.]) >>> magnitude(array, 0) <xarray.DataArray (x: 3)> array([1., 2., 3.]) Coordinates: * x (x) float64 0.1 0.2 0.3
Other examples of how you could use
apply_ufunc
to write functions to (very nearly) replicate existing xarray functionality:Compute the mean (
.mean
) over one dimension:def mean(obj, dim): # note: apply always moves core dimensions to the end return apply_ufunc(np.mean, obj, input_core_dims=[[dim]], kwargs={'axis': -1})
Inner product over a specific dimension (like
xr.dot
):def _inner(x, y): result = np.matmul(x[..., np.newaxis, :], y[..., :, np.newaxis]) return result[..., 0, 0] def inner_product(a, b, dim): return apply_ufunc(_inner, a, b, input_core_dims=[[dim], [dim]])
Stack objects along a new dimension (like
xr.concat
):def stack(objects, dim, new_coord): # note: this version does not stack coordinates func = lambda *x: np.stack(x, axis=-1) result = apply_ufunc(func, *objects, output_core_dims=[[dim]], join='outer', dataset_fill_value=np.nan) result[dim] = new_coord return result
If your function is not vectorized but can be applied only to core dimensions, you can use
vectorize=True
to turn into a vectorized function. This wrapsnumpy.vectorize()
, so the operation isn’t terribly fast. Here we’ll use it to calculate the distance between empirical samples from two probability distributions, using a scipy function that needs to be applied to vectors:import scipy.stats def earth_mover_distance(first_samples, second_samples, dim='ensemble'): return apply_ufunc(scipy.stats.wasserstein_distance, first_samples, second_samples, input_core_dims=[[dim], [dim]], vectorize=True)
Most of NumPy’s builtin functions already broadcast their inputs appropriately for use in apply. You may find helper functions such as numpy.broadcast_arrays helpful in writing your function. apply_ufunc also works well with numba’s vectorize and guvectorize. Further explanation with examples are provided in the xarray documentation [3].