🍾 Xarray is now 10 years old! 🎉

xarray.core.groupby.DatasetGroupBy.count

xarray.core.groupby.DatasetGroupBy.count#

DatasetGroupBy.count(dim=None, *, keep_attrs=None, **kwargs)[source]#

Reduce this Dataset’s data by applying count along some dimension(s).

Parameters:
  • dim (str, Iterable of Hashable, "..." or None, default: None) – Name of dimension[s] along which to apply count. For e.g. dim="x" or dim=["x", "y"]. If None, will reduce over the GroupBy dimensions. If “…”, will reduce over all dimensions.

  • keep_attrs (bool or None, optional) – If True, attrs will be copied from the original object to the new one. If False, the new object will be returned without attributes.

  • **kwargs (Any) – Additional keyword arguments passed on to the appropriate array function for calculating count on this object’s data. These could include dask-specific kwargs like split_every.

Returns:

reduced (Dataset) – New Dataset with count applied to its data and the indicated dimension(s) removed

See also

pandas.DataFrame.count, dask.dataframe.DataFrame.count, Dataset.count

GroupBy: Group and Bin Data

User guide on groupby operations.

Notes

Use the flox package to significantly speed up groupby computations, especially with dask arrays. Xarray will use flox by default if installed. Pass flox-specific keyword arguments in **kwargs. The default choice is method="cohorts" which generalizes the best, other methods might work better for your problem. See the flox documentation for more.

Examples

>>> da = xr.DataArray(
...     np.array([1, 2, 3, 0, 2, np.nan]),
...     dims="time",
...     coords=dict(
...         time=("time", pd.date_range("2001-01-01", freq="ME", periods=6)),
...         labels=("time", np.array(["a", "b", "c", "c", "b", "a"])),
...     ),
... )
>>> ds = xr.Dataset(dict(da=da))
>>> ds
<xarray.Dataset> Size: 120B
Dimensions:  (time: 6)
Coordinates:
  * time     (time) datetime64[ns] 48B 2001-01-31 2001-02-28 ... 2001-06-30
    labels   (time) <U1 24B 'a' 'b' 'c' 'c' 'b' 'a'
Data variables:
    da       (time) float64 48B 1.0 2.0 3.0 0.0 2.0 nan
>>> ds.groupby("labels").count()
<xarray.Dataset> Size: 48B
Dimensions:  (labels: 3)
Coordinates:
  * labels   (labels) object 24B 'a' 'b' 'c'
Data variables:
    da       (labels) int64 24B 1 2 2