xarray.Dataset.groupby

xarray.Dataset.groupby#

Dataset.groupby(group=None, *, squeeze=False, restore_coord_dims=False, eagerly_compute_group=True, **groupers)[source]#

Returns a DatasetGroupBy object for performing grouped operations.

Parameters:
  • group (str or DataArray or IndexVariable or sequence of hashable or mapping of hashable to Grouper) – Array whose unique values should be used to group this array. If a Hashable, must be the name of a coordinate contained in this dataarray. If a dictionary, must map an existing variable name to a Grouper instance.

  • squeeze (False) – This argument is deprecated.

  • restore_coord_dims (bool, default: False) – If True, also restore the dimension order of multi-dimensional coordinates.

  • eagerly_compute_group (bool) – Whether to eagerly compute group when it is a chunked array. This option is to maintain backwards compatibility. Set to False to opt-in to future behaviour, where group is not automatically loaded into memory.

  • **groupers (Mapping of str to Grouper or Resampler) – Mapping of variable name to group by to Grouper or Resampler object. One of group or groupers must be provided. Only a single grouper is allowed at present.

Returns:

grouped (DatasetGroupBy) – A DatasetGroupBy object patterned after pandas.GroupBy that can be iterated over in the form of (unique_value, grouped_array) pairs.

Examples

>>> ds = xr.Dataset(
...     {"foo": (("x", "y"), np.arange(12).reshape((4, 3)))},
...     coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))},
... )

Grouping by a single variable is easy

>>> ds.groupby("letters")
<DatasetGroupBy, grouped over 1 grouper(s), 2 groups in total:
    'letters': 2/2 groups present with labels 'a', 'b'>

Execute a reduction

>>> ds.groupby("letters").sum()
<xarray.Dataset> Size: 64B
Dimensions:  (letters: 2, y: 3)
Coordinates:
  * letters  (letters) object 16B 'a' 'b'
Dimensions without coordinates: y
Data variables:
    foo      (letters, y) int64 48B 9 11 13 9 11 13

Grouping by multiple variables

>>> ds.groupby(["letters", "x"])
<DatasetGroupBy, grouped over 2 grouper(s), 8 groups in total:
    'letters': 2/2 groups present with labels 'a', 'b'
    'x': 4/4 groups present with labels 10, 20, 30, 40>

Use Grouper objects to express more complicated GroupBy operations

>>> from xarray.groupers import BinGrouper, UniqueGrouper
>>>
>>> ds.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()).sum()
<xarray.Dataset> Size: 128B
Dimensions:  (y: 3, x_bins: 2, letters: 2)
Coordinates:
  * x_bins   (x_bins) object 16B (5, 15] (15, 25]
  * letters  (letters) object 16B 'a' 'b'
Dimensions without coordinates: y
Data variables:
    foo      (y, x_bins, letters) float64 96B 0.0 nan nan 3.0 ... nan nan 5.0

See also

GroupBy: Group and Bin Data

Users guide explanation of how to group and bin data.

Computational Patterns

Tutorial on Groupby() for windowed computation.

Grouped Computations

Tutorial on Groupby() demonstrating reductions, transformation and comparison with resample().

pandas.DataFrame.groupby Dataset.groupby_bins DataArray.groupby core.groupby.DatasetGroupBy Dataset.coarsen Dataset.resample DataArray.resample