What’s New#

v2026.07.0 (Jul 9, 2026)#

This release adds support for Dask’s query-optimizing expression arrays, along with new day_of_week and day_of_year datetime accessor attributes. It also includes a number of bug fixes, notably for a performance regression in Coordinates.to_index(), Zarr fill_value round-tripping, and excessive memory use in drop_encoding.

Thanks to the 25 contributors to this release: Davis Bennett, Deepak Cherian, Ian Hunt-Isaak, Illviljan, Jonathan Dung, Julia Signell, Justus Magin, Kai Mühlbauer, MJSHANG, Mark Harfouche, Mathias Hauser, Matt Van Horn, Matthew Rocklin, Max Jones, Maximilian Roos, Nick Hodgskin, S Anand, Spencer Clark, Sreekant Baheti, Timothy Hodson, Tom Nicholas, Vincent Gao, Wali Reheman, Wei Ji and eeshsaxena

New Features#

Added support for Dask’s query-optimizing expression arrays. Xarray now implements the __dask_exprs__ protocol so that Dask can identify and optimize xarray Variable objects without materializing their graphs, together with a chunk manager and map_blocks() support for these arrays (PR11382, PR11398, PR11423). By Matthew Rocklin.
Following pandas, xarray’s DatetimeAccessor now supports day_of_week and day_of_year attributes, which are alternative names for the existing dayofweek and dayofyear attributes. These alternative attributes have similarly been added to CFTimeIndex (PR11270). By Spencer Clark.

Breaking Changes#

Deprecations#

Bug Fixes#

Dataset.drop_encoding() and DataArray.drop_encoding() no longer copy the underlying data, avoiding excessive memory use on large datasets (GH11390, PR11394). By Wali Reheman.
Fix open_dataset() raising OSError when opening data from GDAL virtual filesystems (e.g. /vsicurl/, /vsis3/) or other URI-like paths that do not support stat (PR11392). By Vincent Gao.
Fix testing.assert_equal() with check_dim_order=False for Dataset objects containing variables with different dimension orders (GH10704, PR10718). By Maximilian Roos.
linspace() now handles num=1 like numpy.linspace() (GH11397, PR11401). By S Anand.
Fix a major performance regression in Coordinates.to_index() (and consequently Dataset.to_dataframe()) caused by converting the cached code ndarrays into Python lists (GH11305).
Preserve the Zarr array fill_value in the variable encoding when reading a zarr_format=3 store with use_zarr_fill_value_as_mask=False, so it is no longer silently lost on round-trip (GH10269). By Davis Bennett.
arange() now preserves the requested step instead of silently re-deriving it from (stop - start) / size, so its values match numpy.arange() when step does not evenly divide the interval. Strided slicing of a RangeIndex now preserves the step as well (GH11325). By mokashang.
Fix decode_cf() failing on integer-encoded time arrays that contain NaT when running against numpy 2.5+. By Ian Hunt-Isaak.
Fix TypeError: Implicit conversion to a NumPy array is not allowed when trying to use open_mfdataset() with a backend engine reading to CuPy arrays. By Wei Ji Leong.
The names of DataArray objects returned by properties of the DatetimeAccessor now always match the property names. Previously properties like days_in_month, weekday, and weekofyear would return DataArray objects named "daysinmonth", "dayofweek", and "week", respectively; now they return objects named "days_in_month", "weekday", and "weekofyear" (PR11270). By Spencer Clark.

Documentation#

Internal Changes#

v2026.04.0 (Apr 13, 2026)#

This release bumps the minimum supported zarr version to 3.0, finalizes the deprecation of timedelta decoding via units, adds col_wrap='auto' for plots, a new inherit='all_coords' option for DataTree.to_dataset(), and a facetgrid_figsize option for set_options().

Thanks to the 22 contributors to this release: Adam Newgas, Alfonso Ladino, Copilot, Deepak Cherian, Emmanuel Ferdman, Ian Hunt-Isaak, Ilan Gold, Illviljan, Jakob Harteg, Joe Hamman, Julia Signell, Justus Magin, Kai Mühlbauer, Max Jones, Michael Niklas, Nick Hodgskin, Pieter Eendebak, Spencer Clark, frostByte, kkollsga, rsignell and yaochengchen

New Features#

Added inherit='all_coords' option to DataTree.to_dataset() to inherit all parent coordinates, not just indexed ones (GH10812, PR11230). By Alfonso Ladino.
Support col_wrap='auto' in plots that will wrap the grid to be as square as possible (PR11266). By Michael Niklas.
Added complex dtype support to FillValueCoder for the Zarr backend. (PR11151) By Max Jones.
Added facetgrid_figsize option to set_options() allowing FacetGrid to use matplotlib.rcParams['figure.figsize'] or a fixed (width, height) tuple instead of computing figure size from size and aspect (GH11103). By Kristian Kollsga.

Breaking Changes#

The minimum versions of some dependencies were changed (see table below). Notably, the minimum zarr version is now 3.0. Zarr v2 format data is still readable via zarr-python 3’s built-in compatibility layer; however, zarr-python 2 is no longer a supported dependency. By Joe Hamman.

Dependency	Old Version	New Version
boto3	1.34	1.37
cartopy	0.23	0.24
dask-core	2024.6	2025.2
distributed	2024.6	2025.2
flox	0.9	0.10
h5netcdf	1.4	1.5
h5py	3.11	3.13
iris	3.9	3.11
lxml	5.1	5.3
matplotlib-base	3.8	3.10
numba	0.60	0.61
numbagg	0.8	0.9
packaging	24.1	24.2
rasterio	1.3	1.4
scipy	1.13	1.15
toolz	0.12	1.0
zarr	2.18	3.0

Xarray will no longer by default decode a variable into a np.timedelta64 dtype based on the presence of a timedelta-like "units" attribute alone. Instead it will rely on the presence of a np.timedelta64 dtype attribute, which is now xarray’s default way of encoding np.timedelta64 values. The old decoding behavior can be restored by specifying decode_timedelta=True or decode_timedelta=CFTimedeltaCoder(decode_via_units=True) in open_dataset(). This finalizes the deprecation cycle initiated in xarray version 2025.01.2 (PR11173). By Spencer Clark.
When using h5netcdf engine and passing the path as a string to open_dataset and open_datatree the default behavior of fsspec is now to use block caching with a 4MB block size (PR11216). By Julia Signell.
Passing a Dataset as data_vars to the Dataset constructor now raises TypeError. This was never intended behavior and silently dropped attrs. Use Dataset.copy() instead (GH11095). By Kristian Kollsga.

Deprecations#

Bug Fixes#

Fix multi-coordinate indexes being dropped in DataArray._replace_maybe_drop_dims() (e.g. after reducing over an unrelated dimension) and in Dataset._copy_listed() (e.g. when subsetting a Dataset by variable names). Both paths now consult Index.should_add_coord_to_array(), consistent with Dataset._construct_dataarray(). Also simplify Dataset.to_dataarray() to keep all coordinates and indexes directly, since variables are broadcast and all coords are retained (GH11215, PR11286). By Rich Signell.
Allow writing StringDType variables to netCDF files (GH11199). By Kristian Kollsgård.
Fix Source link in api docs (PR11187) By Ian Hunt-Isaak
Coerce masked dask arrays to filled (GH9374 PR11157). By Julia Signell
Fix Dataset.interp() silently dropping datetime64 and timedelta64 variables, through enabling their interpolation (GH10900, PR11081). By Emmanuel Ferdman.
combine_by_coords() no longer returns an empty dataset when a generator is passed as data_objects (GH10114, PR11265). By Amartya Anand.
Fix h5netcdf backend module detection and ros3 tests (GH11243, PR11274). By Kai Mühlbauer.

Documentation#

Add AI policy (PR11257). By Nick Hodgskin.
Update documentation and team guide to promote Zulip. Remove mentions of Discord (PR11246, PR11254). By Nick Hodgskin.
Fix typos (PR11180, PR11181, PR11182, PR11185, PR11186). By Yaocheng Chen.
Fix code blocks on “how to create custom index” doc page (PR11255). By Nick Hodgskin.

Performance#

Groupby cumsum can now be accelerated with flox. Coordinates are now retained as well. (GH6528, PR10987) By Jimmy Westling.

Internal Changes#

Add script for linting of public docstrings according to numpydoc (PR11121). By Nick Hodgskin.
Add stubtest configuration and allowlist for validating type annotations against runtime behavior. This enables CI integration for type stub validation and helps prevent type annotation regressions (GH11086). By Kristian Kollsgård.
Remove setup.py file (PR11261). By Nick Hodgskin.
Add typing.overload() decorators to DataArray.argmin() and DataArray.argmax() to narrow return type based on dim parameter (GH10893 PR11233). By Amartya Anand.

v2026.02.0 (Feb 13, 2026)#

This release adds support for 1D coordinates in NDPointIndex for scattered point indexing, switches all deprecation warnings to FutureWarning for better end-user visibility, fixes silent data corruption when writing dask arrays to sharded Zarr stores, and improves chunked array tokenization performance.

Thanks to the 11 contributors to this release: Antonio Valentino, Chris Barker, Christine P. Chai, Deepak Cherian, Ewan Short, Harikrishna KP, Ian Hunt-Isaak, Julia Signell, Justus Magin, Kristian Kollsgård and Nick Hodgskin

New Features#

NDPointIndex now supports coordinates with fewer dimensions than coordinate variables, enabling indexing of scattered points and trajectories where multiple coordinates (e.g., x, y) share a single dimension (e.g., points) (GH10940, PR11116). By Ian Hunt-Isaak.

Breaking Changes#

When deprecating functionality, xarray has sometimes used FutureWarning and sometimes used DeprecationWarning. DeprecationWarning is not intended to be visible to end-users so this version of xarray switches to using FutureWarning everywhere (PR11112). By Julia Signell.

Bug Fixes#

Fix slicing with negative step (GH11000, PR11044). By Antonio Valentino.
Fix .plot error when using positional args with col and row (GH11104, PR11111). By Julia Signell.
Slightly amend Xarray’s Zarr Encoding Specification doc for clarity, and provide a code comment in xarray.backends.zarr._get_zarr_dims_and_attrs referencing the doc (GH8749, PR11013). By Ewan Short.
Fix silent data corruption when writing dask arrays to sharded Zarr stores. Dask chunk boundaries must now align with shard boundaries, not just internal Zarr chunk boundaries (GH10831, PR11117). By Kristian Kollsgård.
Fix Dataset.sortby() and DataArray.sortby() placing NaN values at the beginning instead of the end when using ascending=False (GH7358, PR11118). By Kristian Kollsgård.
Raise FileNotFoundError instead of a confusing ValueError when open_dataset() is called with a non-existent local file path (GH10896, PR11150). By Kristian Kollsgård.
Improve error message when a chunk manager is not available, suggesting how to install the required package (PR11056). By Julia Signell.
Raise ValueError on slice-based selection of multi-index levels, which previously returned silently wrong results (GH10534, PR11168). By Harikrishna KP.

Documentation#

Add support for myst markdown (PR11167). By Nick Hodgskin.
Update docstrings for pandas 3 compatibility (PR11130). By Julia Signell.
Various Numpydoc fixes (PR11122). By Nick Hodgskin.
Correct wording mistakes in documentation (PR11120, PR11127). By Christine P. Chai.
Fix broken links in documentation (PR11115, PR11135, PR11161). By Nick Hodgskin.
Fix “latest” version displayed on landing page (PR11119). By Nick Hodgskin.
Add descriptions for pixi tasks (PR11155). By Nick Hodgskin.
Update open_zarr() decode_cf docstring (PR11165). By Nick Hodgskin.
Add MyST Markdown support for documentation (PR11167). By Nick Hodgskin.

Performance#

Add a fast path that skips normalized chunks during tokenization (PR11017). By Julia Signell.

Internal Changes#

Temporarily silence shape assignment warnings raised in netCDF4 (PR11146). By Justus Magin.
Add osx-64 to the pixi configuration (PR11137). By Chris Barker.
Preserve string dtypes instead of converting to object where possible (PR11152). By Julia Signell.

v2026.01.0 (Jan 28, 2026)#

This release includes an improved DataTree HTML representation with collapsible groups and automatic truncation, easier selection on coordinates without explicit indexes, pandas 3 compatibility, and various bug fixes and performance improvements.

Thanks to the 25 contributors to this release: Barron H. Henderson, Christine P. Chai, DHRUVA KUMAR KAUSHAL, David Bold, Davis Bennett, Deepak Cherian, Dhruva Kumar Kaushal, Florian Knappers, Ian Hunt-Isaak, Jacob Tomlinson, Joshua Gould, Julia Signell, Justus Magin, Lucas Colley, Mark Harfouche, Matthew, Maximilian Roos, Nick Hodgskin, Sakshee_D, Sam Levang, Samay Mehar, Simon Høxbro Hansen, Spencer Clark, Stephan Hoyer and knappersfy

New Features#

Improved DataTree HTML representation: groups are now collapsible with item counts shown in labels, large trees are automatically truncated using display_max_children and display_max_html_elements options, and the Indexes section is now displayed (matching the text repr) (PR10816). By Stephan Hoyer.
Dataset.set_xindex() and DataArray.set_xindex() automatically replace any existing index being set instead of erroring or needing needing to call drop_indexes() first (PR11008). By Ian Hunt-Isaak.
Calling Dataset.sel() or DataArray.sel() on a 1-dimensional coordinate without an index will now automatically create a temporary PandasIndex to perform the selection (GH9703, PR11029). By Ian Hunt-Isaak.
The minimum supported version of h5netcdf is now 1.4. Version 1.4.0 brings improved alignment between h5netcdf and libnetcdf4 in the storage of complex numbers (PR11068). By Mark Harfouche.
set_options() now supports an arithmetic_compat option which determines how non-index coordinates of the same name are compared for potential conflicts when performing binary operations. The default for it is arithmetic_compat='minimal' which matches the existing behaviour (PR10943). By Matthew Willson.
Better ordering of coordinates when displaying xarray objects (PR11091). By Ian Hunt-Isaak, Julia Signell.
Use np.dtypes.StringDType when reading Zarr string variables (PR11097). By Julia Signell.

Breaking Changes#

Change the default value for chunk in open_zarr() to _default and remove special mapping of "auto" to {} or None in open_zarr(). If chunks is not set, the default behavior is the same as before. Explicitly setting chunks="auto" will match the behavior of chunks="auto" in open_dataset() with engine="zarr" (GH11002, PR11010). By Julia Signell.
Dataset.identical(), DataArray.identical(), and testing.assert_identical() now compare indexes. Two objects with identical data but different indexes will no longer be considered identical (GH11033, PR11035). By Ian Hunt-Isaak.

Bug Fixes#

Ensure that keep_attrs='drop' and keep_attrs=False remove attrs from result, even when there is only one xarray object given to apply_ufunc() (GH10982, PR10997). By Julia Signell.
equals() now uses floating point error tolerant np.isclose by default to handle accumulated floating point errors from slicing operations. Use exact=True for exact comparison (PR11035). By Ian Hunt-Isaak.
Ensure the SeasonResampler preserves the datetime unit of the underlying time index when resampling (GH11048, PR11049). By Spencer Clark.
Partially support pandas 3 default string indexes by coercing pd.StringDtype to np.dtypes.StringDType in PandasIndexingAdapter (GH11098, PR11102). By Julia Signell.
Dataset.eval() now works with more than 2 dimensions (PR11064). By Maximilian Roos.
Fix where() for cupy.array inputs (PR11026). By Simon Høxbro Hansen.
Fix CombinedLock.locked() to correctly call the underlying lock’s locked() method (GH10843, PR11022). By Samay Mehar.
Fix DatasetGroupBy.map() when grouping by more than one variable (PR11005). By Joshua Gould.
Fix indexing bugs in CoordinateTransformIndex (PR10980). By Deepak Cherian.
Ensure the netCDF4 backend locks files while closing to prevent race conditions (PR10788). By David Bold.
Improve error message when scipy is missing for NDPointIndex (PR11085). By Sakshee_D.

Documentation#

Better description of keep_attrs option on xarray.where() docstring (GH10982, PR10997). By Julia Signell.
Document how xarray.dot() interacts with coordinates (PR10958). By Dhruva Kumar Kaushal.
Improve rolling window documentation (PR11094). By Barron H. Henderson.
Improve combine_nested and combine_by_coords docstrings (PR11080). By Julia Signell.

Performance#

Add a fastpath to the backend plugin system for standard engines (GH10178, PR10937). By Sam Levang.
Optimize CFMaskCoder decoder (PR11105). By Deepak Cherian.

Internal Changes#

Update contributing instructions with note on pixi version (PR11108). By Nick Hodgskin.

v2025.12.0 (Dec 5, 2025)#

This release rolls back the default engine for HTTP urls, adds support for DataTree objects in combine_nested and contains numerous bug fixes.

Thanks to the 16 contributors to this release: Benoit Bovy, Christine P. Chai, Deepak Cherian, Dhruva Kumar Kaushal, Ian Hunt-Isaak, Ilan Gold, Illviljan, Julia Signell, Justus Magin, Lars Buntemeyer, Maximilian Roos, Miguel Jimenez, Nick Hodgskin, Richard Berg, Spencer Clark and Stephan Hoyer

New Features#

Improved pydap backend behavior and performance when using open_dataset(), open_datatree() when downloading dap4 (opendap) dimensions data (GH10628, PR10629). In addition checksums=True|False is added as optional argument to be passed to pydap backend. By Miguel Jimenez-Urias.
combine_nested() now supports DataTree objects (PR10849). By Stephan Hoyer.

Bug Fixes#

When assigning an indexed coordinate to a data variable or coordinate, coerce it from IndexVariable to Variable (GH9859, GH10829, PR10909). By Julia Signell.
The NetCDF4 backend will now claim to be able to read any URL except for one that contains the substring zarr. This restores backward compatibility after PR10804 broke workflows that relied on xr.open_dataset("http://...") (PR10931). By Ian Hunt-Isaak.
Always normalize slices when indexing LazilyIndexedArray instances (GH10941, PR10948). By Justus Magin.
Avoid casting custom indexes in Dataset.drop_attrs (PR10961) By Justus Magin.
Support decoding unsigned integers to np.timedelta64. By Deepak Cherian.
Properly handle internal type promotion and NA objects for extension arrays (PR10423). By Ilan Gold.

Documentation#

Added section on the limitations of cftime arithmetic (PR10653). By Lars Buntemeyer.

Internal Changes#

Change the development workflow to use pixi (GH10732, PR10888). By Nick Nodgskin.

v2025.11.0 (Nov 17, 2025)#

This release changes the default for keep_attrs such that attributes are preserved by default, adds support for DataTree in top-level functions, and contains several memory and performance improvements as well as a number of bug fixes.

Thanks to the 21 contributors to this release: Aled Owen, Charles Turner, Christine P. Chai, David Huard, Deepak Cherian, Gregorio L. Trevisan, Ian Hunt-Isaak, Ilan Gold, Illviljan, Jan Meischner, Jemma Jeffree, Jonas Lundholm Bertelsen, Justus Magin, Kai Mühlbauer, Kristian Bodolai, Lukas Riedel, Max Jones, Maximilian Roos, Niclas Rieger, Stephan Hoyer and William Andrea

New Features#

merge() and concat() now support DataTree objects (GH9790, GH9778). By Stephan Hoyer.
The h5netcdf engine has support for pseudo NETCDF4_CLASSIC files, meaning variables and attributes are cast to supported types. Note that the saved files won’t be recognized as genuine NETCDF4_CLASSIC files until h5netcdf adds support with version 1.7.0 (GH10676, PR10686). By David Huard.
Support comparing DataTree objects with testing.assert_allclose() (PR10887). By Justus Magin.
Add support for chunks="auto" for cftime datasets (GH9834, PR10527). By Charles Turner.

Breaking Changes#

All xarray operations now preserve attributes by default (GH3891, GH2582). Previously, operations would drop attributes unless explicitly told to preserve them via keep_attrs=True. Additionally, when attributes are preserved in binary operations, they now combine attributes from both operands using drop_conflicts (keeping matching attributes, dropping conflicts), instead of keeping only the left operand’s attributes.

What changed:
```
# Before (xarray <2025.11.0):
data = xr.DataArray([1, 2, 3], attrs={"units": "meters", "long_name": "height"})
result = data.mean()
result.attrs  # {}  - Attributes lost!

# After (xarray ≥2025.09.1):
data = xr.DataArray([1, 2, 3], attrs={"units": "meters", "long_name": "height"})
result = data.mean()
result.attrs  # {"units": "meters", "long_name": "height"}  - Attributes preserved!
```
Affected operations include:

Computational operations:
- Reductions: mean(), sum(), std(), var(), min(), max(), median(), quantile(), etc.
- Rolling windows: rolling().mean(), rolling().sum(), etc.
- Groupby: groupby().mean(), groupby().sum(), etc.
- Resampling: resample().mean(), etc.
- Weighted: weighted().mean(), weighted().sum(), etc.
- apply_ufunc() and NumPy universal functions
Binary operations:
- Arithmetic: +, -, *, /, **, //, % (combines attributes using drop_conflicts)
- Comparisons: <, >, ==, !=, <=, >= (combines attributes using drop_conflicts)
- With scalars: data * 2, 10 - data (preserves data’s attributes)
Data manipulation:
- Missing data: fillna(), dropna(), interpolate_na(), ffill(), bfill()
- Indexing/selection: isel(), sel(), where(), clip()
- Alignment: interp(), reindex(), align()
- Transformations: map(), pipe(), assign(), assign_coords()
- Shape operations: expand_dims(), squeeze(), transpose(), stack(), unstack()
Binary operations - combines attributes with drop_conflicts:
```
a = xr.DataArray([1, 2], attrs={"units": "m", "source": "sensor_a"})
b = xr.DataArray([3, 4], attrs={"units": "m", "source": "sensor_b"})
(a + b).attrs  # {"units": "m"}  - Matching values kept, conflicts dropped
(b + a).attrs  # {"units": "m"}  - Order doesn't matter for drop_conflicts
```
How to restore previous behavior:
1. Globally for your entire script:
```
import xarray as xr

xr.set_options(keep_attrs=False)  # Affects all subsequent operations
```
2. For specific operations:
```
result = data.mean(dim="time", keep_attrs=False)
```
3. For code blocks:
```
with xr.set_options(keep_attrs=False):
    # All operations in this block drop attrs
    result = data1 + data2
```
4. Remove attributes after operations:
```
result = data.mean().drop_attrs()
```
By Maximilian Roos.

Bug Fixes#

Fix h5netcdf backend for format=None, use same rule as netcdf4 backend (PR10859). By Kai Mühlbauer.
netcdf4 and pydap backends now use stricter URL detection to avoid incorrectly claiming remote URLs. The pydap backend now only claims URLs with explicit DAP protocol indicators (dap2:// or dap4:// schemes, or /dap2/ or /dap4/ in the URL path). This prevents both backends from claiming remote Zarr stores and other non-DAP URLs without an explicit engine= argument (PR10804). By Ian Hunt-Isaak.
Fix indexing with empty arrays for scipy & h5netcdf backends which now resolves to empty slices (GH10867, PR10870). By Kai Mühlbauer
Fix error handling issue in decode_cf_variables when decoding fails - the exception is now re-raised correctly, with a note added about the variable name that caused the error (GH10873, PR10886). By Jonas L. Bertelsen.
Fix equivalent for numpy scalar nan comparison (GH10833, PR10838). By Maximilian Roos.
Support non-DataArray outputs in Dataset.map() (GH10835, PR10839). By Maximilian Roos.
Support drop_sel on MultiIndex objects (GH10862, PR10863). By Aled Owen.

Performance#

Speedup and reduce memory usage of concat(). Magnitude of improvement scales with size of the concatenation dimension (GH10864, PR10866). By Deepak Cherian.
Speedup and reduce memory usage when coarsening along multiple dimensions (PR10921) By Deepak Cherian.

v2025.10.1 (Oct 7, 2025)#

This release reverts a breaking change to Xarray’s preferred netCDF backend.

Breaking changes#

Xarray’s default engine for reading/writing netCDF files has been reverted to prefer netCDF4 over h5netcdf over scipy, which was the default before v2025.09.1. This change had larger implications for the ecosystem than we anticipated. We are still considering changing the default in the future, but will be a bit more careful about the implications. See GH10657 and linked issues for discussion. The behavior can still be customized, e.g., with xr.set_options(netcdf_engine_order=['h5netcdf', 'netcdf4', 'scipy']). By Stephan Hoyer.

New features#

Coordinates are ordered to match dims when displaying Xarray objects. (PR10778). By Julia Signell.

Bug fixes#

Fix error raised when writing scalar variables to Zarr with region={} (PR10796). By Stephan Hoyer.

v2025.09.1 (Sep 29, 2025)#

This release contains improvements to netCDF IO and the DataTree.from_dict() constructor, as well as a variety of bug fixes. In particular, the default netCDF backend has switched from netCDF4 to h5netcdf, which is typically faster.

Thanks to the 17 contributors to this release: Claude, Deepak Cherian, Dimitri Papadopoulos Orfanos, Dylan H. Morris, Emmanuel Mathot, Ian Hunt-Isaak, Joren Hammudoglu, Julia Signell, Justus Magin, Maximilian Roos, Nick Hodgskin, Spencer Clark, Stephan Hoyer, Tom Nicholas, gronniger, joseph nowak and pierre-manchon

New Features#

DataTree.from_dict() now supports passing in DataArray and nested dictionary values, and has a coords argument for specifying coordinates as DataArray objects (PR10658).
engine='netcdf4' now supports reading and writing in-memory netCDF files. All of Xarray’s netCDF backends now support in-memory reads and writes (PR10624). By Stephan Hoyer.

Breaking changes#

Dataset.update() now returns None, instead of the updated dataset. This completes the deprecation cycle started in version 0.17. The method still updates the dataset in-place. (GH10167) By Maximilian Roos.
The default engine when reading/writing netCDF files is now h5netcdf or scipy, which are typically faster than the prior default of netCDF4-python. You can control this default behavior explicitly via the new netcdf_engine_order parameter in set_options(), e.g., xr.set_options(netcdf_engine_order=['netcdf4', 'scipy', 'h5netcdf']) to restore the prior defaults (GH10657). By Stephan Hoyer.
The HTML reprs for DataArray, Dataset and DataTree have been tweaked to hide empty sections, consistent with the text reprs. The DataTree HTML repr also now automatically expands sub-groups (PR10785). By Stephan Hoyer.
Zarr stores written with Xarray now consistently use a default Zarr fill value of NaN for float variables, for both Zarr v2 and v3 (GH10646`). All other dtypes still use the Zarr default fill_value of zero. To customize, explicitly set encoding in to_zarr(), e.g., encoding=dict.fromkey(ds.data_vars, {'fill_value': 0}). By Stephan Hoyer.

Deprecations#

Bug fixes#

Xarray objects opened from file-like objects with engine='h5netcdf' can now be pickled, as long as the underlying file-like object also supports pickle (GH10712). By Stephan Hoyer.
Closing Xarray objects opened from file-like objects with `engine='scipy' no longer closes the underlying file, consistent with the h5netcdf backend (PR10624). By Stephan Hoyer.
Fix the align_chunks parameter on the to_zarr() method, it was not being passed to the underlying api() method (GH10501, PR10516).
Fix error when encoding an empty numpy.datetime64 array (GH10722, PR10723). By Spencer Clark.
Propagate coordinate attrs in xarray.Dataset.map() (GH9317, PR10602).
Fix error from to_netcdf(..., compute=False) when using Dask Distributed (GH10725). By Stephan Hoyer.
Propagation coordinate attrs in xarray.Dataset.map() (GH9317, PR10602). By Justus Magin.
Allow combine_attrs="drop_conflicts" to handle objects with __eq__ methods that return non-bool values (e.g., numpy arrays) without raising ValueError (PR10726). By Maximilian Roos.

Documentation#

Fixed Zarr encoding documentation with consistent examples and added comprehensive coverage of dimension and coordinate encoding differences between Zarr V2 and V3 formats. The documentation shows what users will see when accessing Zarr files with raw zarr-python, and explains the relationship between _ARRAY_DIMENSIONS (Zarr V2), dimension_names metadata (Zarr V3), and CF coordinates attributes. (PR10720) By Emmanuel Mathot.

Internal Changes#

Refactor structure of backends module to separate code for reading data from code for writing data (PR10771). By Tom Nicholas.
All test files now have full mypy type checking enabled (check_untyped_defs = true), improving type safety and making the test suite a better reference for type annotations. (PR10768) By Maximilian Roos.

v2025.09.0 (Sep 2, 2025)#

This release brings a number of small improvements and fixes, especially related to writing DataTree objects and netCDF files to disk.

Thanks to the 13 contributors to this release: Benoit Bovy, DHRUVA KUMAR KAUSHAL, Deepak Cherian, Dhruva Kumar Kaushal, Giacomo Caria, Ian Hunt-Isaak, Illviljan, Justus Magin, Kai Mühlbauer, Ruth Comer, Spencer Clark, Stephan Hoyer and Tom Nicholas

New Features#

Support rechunking by SeasonResampler for seasonal data analysis (GH10425, PR10519). By Dhruva Kumar Kaushal.
Add convenience methods to Coordinates (PR10318) By Justus Magin.
Added load_datatree() for loading DataTree objects into memory from disk. It has the same relationship to open_datatree(), as load_dataset() has to open_dataset(). By Stephan Hoyer.
compute=False is now supported by DataTree.to_netcdf() and DataTree.to_zarr(). By Stephan Hoyer.
open_dataset will now correctly infer a path ending in .zarr/ as zarr By Ian Hunt-Isaak.

Breaking changes#

Following pandas 3.0 (pandas-dev/pandas#61985), Day is no longer considered a Tick-like frequency. Therefore non-None values of offset and non-"start_day" values of origin will have no effect when resampling to a daily frequency for objects indexed by a xarray.CFTimeIndex. As in pandas-dev/pandas#62101 warnings will be emitted if non default values are provided in this context (GH10640, PR10650). By Spencer Clark.
The default backend engine used by Dataset.to_netcdf() and DataTree.to_netcdf() is now chosen consistently with open_dataset() and open_datatree(), using whichever netCDF libraries are available and valid, and preferring netCDF4 to h5netcdf to scipy (GH10654). This will change the default backend in some edge cases (e.g., from scipy to netCDF4 when writing to a file-like object or bytes). To override these new defaults, set engine explicitly. By Stephan Hoyer.
The return value of Dataset.to_netcdf() without path is now a memoryview object instead of bytes (PR10656). This removes an unnecessary memory copy and ensures consistency when using either engine="scipy" or engine="h5netcdf". If you need a bytes object, simply wrap the return value of to_netcdf() with bytes(). By Stephan Hoyer.

Bug fixes#

Fix contour plots not normalizing the colors correctly when using for example logarithmic norms. (GH10551, PR10565) By Jimmy Westling.
Fix distribution of auto_complex keyword argument for open_datatree (GH10631, PR10632). By Kai Mühlbauer.
Warn instead of raise in case of misconfiguration of unlimited_dims originating from dataset.encoding, to prevent breaking users workflows (GH10647, PR10648). By Kai Mühlbauer.
DataTree.to_netcdf() and DataTree.to_zarr() now avoid redundant computation of Dask arrays with cross-group dependencies (GH10637). By Stephan Hoyer.
DataTree.to_netcdf() had h5netcdf hard-coded as default (GH10654). By Stephan Hoyer.

Internal Changes#

Run TestNetCDF4Data as TestNetCDF4DataTree through open_datatree (PR10632). By Kai Mühlbauer.

v2025.08.0 (Aug 14, 2025)#

This release brings the ability to load xarray objects asynchronously, write netCDF as bytes, fixes a number of bugs, and starts an important deprecation cycle for changing the default values of keyword arguments for various xarray combining functions.

Thanks to the 24 contributors to this release: Alfonso Ladino, Brigitta Sipőcz, Claude, Deepak Cherian, Dimitri Papadopoulos Orfanos, Eric Jansen, Ian Hunt-Isaak, Ilan Gold, Illviljan, Julia Signell, Justus Magin, Kai Mühlbauer, Mathias Hauser, Matthew, Michael Niklas, Miguel Jimenez, Nick Hodgskin, Pratiman, Scott Staniewicz, Spencer Clark, Stephan Hoyer, Tom Nicholas, Yang Yang and jemmajeffree

New Features#

Added DataTree.prune() method to remove empty nodes while preserving tree structure. Useful for cleaning up DataTree after time-based filtering operations (GH10590, PR10598). By Alfonso Ladino.
Added new asynchronous loading methods Dataset.load_async(), DataArray.load_async(), Variable.load_async(). Note that users are expected to limit concurrency themselves - xarray does not internally limit concurrency in any way. (GH10326, PR10327) By Tom Nicholas.
DataTree.to_netcdf() can now write to a file-like object, or return bytes if called without a filepath. (GH10570) By Matthew Willson.
Added exception handling for invalid files in open_mfdataset(). (GH6736) By Pratiman Patel.

Breaking changes#

When writing to NetCDF files with groups, Xarray no longer redefines dimensions that have the same size in parent groups (GH10241). This conforms with CF Conventions for group scrope but may require adjustments for code that consumes NetCDF files produced by Xarray. By Stephan Hoyer.

Deprecations#

Start a deprecation cycle for changing the default keyword arguments to concat(), merge(), combine_nested(), combine_by_coords(), and open_mfdataset(). Emits a FutureWarning when using old defaults and new defaults would result in different behavior. Adds an option: use_new_combine_kwarg_defaults to opt in to new defaults immediately. New values are:
- data_vars: None which means all when concatenating along a new dimension, and "minimal" when concatenating along an existing dimension
- coords: “minimal”
- compat: “override”
- join: “exact”
(GH8778, GH1385, PR10062). By Julia Signell.

Bug fixes#

Fix Pydap Datatree backend testing. Testing now compares elements of (unordered) two sets (before, lists) (PR10525). By Miguel Jimenez-Urias.
Fix KeyError when passing a dim argument different from the default to convert_calendar (PR10544). By Eric Jansen.
Fix transpose of boolean arrays read from disk. (GH10536) By Deepak Cherian.
Fix detection of the h5netcdf backend. Xarray now selects h5netcdf if the default netCDF4 engine is not available (GH10401, PR10557). By Scott Staniewicz.
Fix merge() to prevent altering original object depending on join value (PR10596) By Julia Signell.
Ensure unlimited_dims passed to xarray.DataArray.to_netcdf(), xarray.Dataset.to_netcdf() or xarray.DataTree.to_netcdf() only contains dimensions present in the object; raise ValueError otherwise (GH10549, PR10608). By Kai Mühlbauer.

Documentation#

Clarify lazy behaviour and eager loading for chunks=None in open_dataset(), open_dataarray(), open_datatree(), open_groups() and open_zarr() (GH10612, PR10627). By Kai Mühlbauer.

Performance#

Speed up non-numeric scalars when calling Dataset.interp(). (GH10054, PR10554) By Jimmy Westling.

v2025.07.1 (Jul 09, 2025)#

This release brings a lot of improvements to flexible indexes functionality, including new classes to ease building of new indexes with custom coordinate transforms (indexes.CoordinateTransformIndex) and tree-like index structures (indexes.NDPointIndex). See a new gallery showing off the possibilities enabled by flexible indexes.

Thanks to the 7 contributors to this release: Benoit Bovy, Deepak Cherian, Dhruva Kumar Kaushal, Dimitri Papadopoulos Orfanos, Illviljan, Justus Magin and Tom Nicholas

New Features#

New xarray.indexes.NDPointIndex, which by default uses scipy.spatial.KDTree under the hood for the selection of irregular, n-dimensional data (PR10478). By Benoit Bovy.
Allow skipping the creation of default indexes when opening datasets (PR8051). By Benoit Bovy and Justus Magin.

Bug fixes#

Dataset.set_xindex() now raises a helpful error when a custom index creates extra variables that don’t match the provided coordinate names, instead of silently ignoring them. The error message suggests using the factory method pattern with xarray.Coordinates.from_xindex() and Dataset.assign_coords() for advanced use cases (GH10499, PR10503). By Dhruva Kumar Kaushal.

Documentation#

A new gallery showing off the possibilities enabled by flexible indexes.

Internal Changes#

Refactored the PandasIndexingAdapter and CoordinateTransformIndexingAdapter internal indexing classes. Coordinate variables that wrap a pandas.RangeIndex, a pandas.MultiIndex or a xarray.indexes.CoordinateTransform are now displayed as lazy variables in the Xarray data reprs (PR10355). By Benoit Bovy.

v2025.07.0 (Jul 3, 2025)#

This release extends xarray’s support for custom index classes, restores support for reading netCDF3 files with SciPy, updates minimum dependencies, and fixes a number of bugs.

Thanks to the 17 contributors to this release: Bas Nijholt, Benoit Bovy, Deepak Cherian, Dhruva Kumar Kaushal, Dimitri Papadopoulos Orfanos, Ian Hunt-Isaak, Kai Mühlbauer, Mathias Hauser, Maximilian Roos, Miguel Jimenez, Nick Hodgskin, Scott Henderson, Shuhao Cao, Spencer Clark, Stephan Hoyer, Tom Nicholas and Zsolt Cserna

New Features#

Expose RangeIndex, and CoordinateTransformIndex as public api under the xarray.indexes namespace. By Deepak Cherian.
Support zarr-python’s new .supports_consolidated_metadata store property (PR10457`). by Tom Nicholas.
Better error messages when encoding data to be written to disk fails (PR10464). By Stephan Hoyer

Breaking changes#

The minimum versions of some dependencies were changed (GH10417, PR10438): By Dhruva Kumar Kaushal.

Dependency	Old Version	New Version
Python	3.10	3.11
array-api-strict	1.0	1.1
boto3	1.29	1.34
bottleneck	1.3	1.4
cartopy	0.22	0.23
dask-core	2023.11	2024.6
distributed	2023.11	2024.6
flox	0.7	0.9
h5py	3.8	3.11
hdf5	1.12	1.14
iris	3.7	3.9
lxml	4.9	5.1
matplotlib-base	3.7	3.8
numba	0.57	0.60
numbagg	0.6	0.8
numpy	1.24	1.26
packaging	23.2	24.1
pandas	2.1	2.2
pint	0.22	0.24
pydap	N/A	3.5
scipy	1.11	1.13
sparse	0.14	0.15
typing_extensions	4.8	Removed
zarr	2.16	2.18

Bug fixes#

Fix Pydap test_cmp_local_file for numpy 2.3.0 changes, 1. do always return arrays for all versions and 2. skip astype(str) for numpy >= 2.3.0 for expected data. (PR10421) By Kai Mühlbauer.
Fix the SciPy backend for netCDF3 files . (GH8909, PR10376) By Deepak Cherian.
Check and fix character array string dimension names, issue warnings as needed (GH6352, PR10395). By Kai Mühlbauer.
Fix the error message of testing.assert_equal() when two different DataTree objects are passed (PR10440). By Mathias Hauser.
Fix testing.assert_equal() with check_dim_order=False for DataTree objects (PR10442). By Mathias Hauser.
Fix Pydap backend testing. Now test forces string arrays to dtype “S” (pydap converts them to unicode type by default). Removes conditional to numpy version. (GH10261, PR10482) By Miguel Jimenez-Urias.
Fix attribute overwriting bug when decoding encoded numpy.timedelta64 values from disk with a dtype attribute (GH10468, PR10469). By Spencer Clark.
Fix default "_FillValue" dtype coercion bug when encoding numpy.timedelta64 values to an on-disk format that only supports 32-bit integers (GH10466, PR10469). By Spencer Clark.

Internal Changes#

Forward variable name down to coders for AbstractWritableDataStore.encode_variable and subclasses. (PR10395). By Kai Mühlbauer.

v2025.06.1 (Jun 11, 2025)#

This is quick bugfix release to remove an unintended dependency on typing_extensions.

Thanks to the 4 contributors to this release: Alex Merose, Deepak Cherian, Ilan Gold and Simon Perkins

Bug fixes#

Remove dependency on typing_extensions (PR10413). By Simon Perkins.

v2025.06.0 (Jun 10, 2025)#

This release brings HTML reprs to the documentation, fixes to flexible Xarray indexes, performance optimizations, more ergonomic seasonal grouping and resampling with new SeasonGrouper and SeasonResampler objects, and bugfixes. Thanks to the 33 contributors to this release: Andrecho, Antoine Gibek, Benoit Bovy, Brian Michell, Christine P. Chai, David Huard, Davis Bennett, Deepak Cherian, Dimitri Papadopoulos Orfanos, Elliott Sales de Andrade, Erik, Erik Månsson, Giacomo Caria, Ilan Gold, Illviljan, Jesse Rusak, Jonathan Neuhauser, Justus Magin, Kai Mühlbauer, Kimoon Han, Konstantin Ntokas, Mark Harfouche, Michael Niklas, Nick Hodgskin, Niko Sirmpilatze, Pascal Bourgault, Scott Henderson, Simon Perkins, Spencer Clark, Tom Vo, Trevor James Smith, joseph nowak and micguerr-bopen

New Features#

Switch docs to jupyter-execute sphinx extension for HTML reprs. (GH3893, PR10383) By Scott Henderson.
Allow an Xarray index that uses multiple dimensions checking equality with another index for only a subset of those dimensions (i.e., ignoring the dimensions that are excluded from alignment). (GH10243, PR10293) By Benoit Bovy.
New SeasonGrouper and SeasonResampler objects for ergonomic seasonal aggregation. See the docs on Handling Seasons or blog post for more. By Deepak Cherian.
Data corruption issues arising from misaligned Dask and Zarr chunks can now be prevented using the new align_chunks parameter in to_zarr(). This option automatically rechunk the Dask array to align it with the Zarr storage chunks. For now, it is disabled by default, but this could change on the future. (GH9914, PR10336) By Joseph Nowak.

Documentation#

HTML reprs! By Scott Henderson.

Bug fixes#

Fix BinGrouper when labels is not specified (GH10284). By Deepak Cherian.
Allow accessing arbitrary attributes on Pandas ExtensionArrays. By Deepak Cherian.
Fix coding empty (zero-size) timedelta64 arrays, units taking precedence when encoding, fallback to default values when decoding (GH10310, PR10313). By Kai Mühlbauer.
Use dtype from intermediate sum instead of source dtype or “int” for casting of count when calculating mean in rolling for correct operations (preserve float dtypes, correct mean of bool arrays) (GH10340, PR10341). By Kai Mühlbauer.
Improve the html repr of Xarray objects (dark mode, icons and variable attribute / data dropdown sections). (PR10353, PR10354) By Benoit Bovy.
Raise an error when attempting to encode numpy.datetime64 values prior to the Gregorian calendar reform date of 1582-10-15 with a "standard" or "gregorian" calendar. Previously we would warn and encode these as cftime.DatetimeGregorian objects, but it is not clear that this is the user’s intent, since this implicitly converts the calendar of the datetimes from "proleptic_gregorian" to "gregorian" and prevents round-tripping them as numpy.datetime64 values (PR10352). By Spencer Clark.
Avoid unsafe casts from float to unsigned int in CFMaskCoder (GH9815, PR9964). By ` Elliott Sales de Andrade <QuLogic>`_.

Performance#

Lazily indexed arrays now use less memory to store keys by avoiding copies in VectorizedIndexer and OuterIndexer (GH10316). By Jesse Rusak.
Fix performance regression in interp where more data was loaded than was necessary. (GH10287). By Deepak Cherian.
Speed up encoding of cftime.datetime objects by roughly a factor of three (PR8324). By Antoine Gibek.

v2025.04.0 (Apr 29, 2025)#

This release brings bug fixes, better support for extension arrays including returning a pandas.IntervalArray from groupby_bins, and performance improvements. Thanks to the 24 contributors to this release: Alban Farchi, Andrecho, Benoit Bovy, Deepak Cherian, Dimitri Papadopoulos Orfanos, Florian Jetter, Giacomo Caria, Ilan Gold, Illviljan, Joren Hammudoglu, Julia Signell, Kai Muehlbauer, Kai Mühlbauer, Mathias Hauser, Mattia Almansi, Michael Sumner, Miguel Jimenez, Nick Hodgskin (🦎 Vecko), Pascal Bourgault, Philip Chmielowiec, Scott Henderson, Spencer Clark, Stephan Hoyer and Tom Nicholas

New Features#

By default xarray now encodes numpy.timedelta64 values by converting to numpy.int64 values and storing "dtype" and "units" attributes consistent with the dtype of the in-memory numpy.timedelta64 values, e.g. "timedelta64[s]" and "seconds" for second-resolution timedeltas. These values will always be decoded to timedeltas without a warning moving forward. Timedeltas encoded via the previous approach can still be roundtripped exactly, but in the future will not be decoded by default (GH1621, GH10099, PR10101). By Spencer Clark.
Added scipy-stubs to the xarray[types] dependencies. By Joren Hammudoglu.
Added a xarray.typing module to expose selected public types for use in downstream libraries and static type checking. (GH10179, PR10215). By Michele Guerreri.
Improved compatibility with OPeNDAP DAP4 data model for backend engine pydap. This includes datatree support, and removing slashes from dimension names. By Miguel Jimenez-Urias.
Allow assigning index coordinates with non-array dimension(s) in a DataArray by overriding Index.should_add_coord_to_array(). For example, this enables support for CF boundaries coordinate (e.g., time(time) and time_bnds(time, nbnd)) in a DataArray (PR10137). By Benoit Bovy.
Improved support pandas categorical extension as indices (i.e., pandas.IntervalIndex). (GH9661, PR9671) By Ilan Gold.
Improved checks and errors raised when trying to align objects with conflicting indexes. It is now possible to align objects each with multiple indexes sharing common dimension(s). (GH7695, PR10251) By Benoit Bovy.

Breaking changes#

The minimum versions of some dependencies were changed

Package

Old

New

pydap

3.4

3.5.0
Reductions with groupby_bins or those that involve xarray.groupers.BinGrouper now return objects indexed by pandas.IntervalArray() objects, instead of numpy object arrays containing tuples. This change enables interval-aware indexing of such Xarray objects. (PR9671). By Ilan Gold.
Remove PandasExtensionArrayIndex from xarray.Variable.data when the attribute is a pandas.api.extensions.ExtensionArray (PR10263). By Ilan Gold.
The html and text repr for DataTree are now truncated. Up to 6 children are displayed for each node – the first 3 and the last 3 children – with a ... between them. The number of children to include in the display is configurable via options. For instance use set_options(display_max_children=8) to display 8 children rather than the default 6. (PR10139) By Julia Signell.

Deprecations#

The deprecation cycle for the eagerly_compute_group kwarg to groupby and groupby_bins is now complete. By Deepak Cherian.

Bug fixes#

to_stacked_array() now uses dimensions in order of appearance. This fixes the issue where using transpose() before to_stacked_array() had no effect. (Mentioned in GH9921)
Enable keep_attrs in DatasetView.map relevant for map_over_datasets() (PR10219) By Mathias Hauser.
Variables with no temporal dimension are left untouched by convert_calendar(). (GH10266, PR10268) By Pascal Bourgault.
Enable chunk_key_encoding in to_zarr() for Zarr v2 Datasets (PR10274) By BrianMichell.

Documentation#

Fix references to core classes in docs (GH10195, PR10207). By Mattia Almansi.
Fix references to point to updated pydap documentation (PR10182). By Miguel Jimenez-Urias.
Switch to pydata-sphinx-theme from sphinx-book-theme (PR8708). By Scott Henderson.
Add a dedicated ‘Complex Numbers’ sections to the User Guide (GH10213, PR10235). By Andre Wendlinger.

Internal Changes#

Avoid stacking when grouping by a chunked array. This can be a large performance improvement. By Deepak Cherian.
The implementation of Variable.set_dims has changed to use array indexing syntax instead of np.broadcast_to to perform dimension expansions where all new dimensions have a size of 1. This should improve compatibility with duck arrays that do not support broadcasting (GH9462, PR10277). By Mark Harfouche.

v2025.03.1 (Mar 30, 2025)#

This release brings the ability to specify fill_value and write_empty_chunks for Zarr V3 stores, and a few bug fixes. Thanks to the 10 contributors to this release: Andrecho, Deepak Cherian, Ian Hunt-Isaak, Karl Krauth, Mathias Hauser, Maximilian Roos, Nick Hodgskin (🦎 Vecko), Spencer Clark, Tom Nicholas and wpbonelli.

New Features#

Allow setting a fill_value for Zarr format 3 arrays. Specify fill_value in encoding as usual. (GH10064). By Deepak Cherian.
Added indexes.RangeIndex as an alternative, memory saving Xarray index representing a 1-dimensional bounded interval with evenly spaced floating values (GH8473, PR10076). By Benoit Bovy.

Breaking changes#

Explicitly forbid appending a DataTree to zarr using to_zarr() with append_dim, because the expected behaviour is currently undefined. (GH9858, PR10156) By Tom Nicholas.

Bug fixes#

Update the parameters of to_zarr() to match to_zarr(). This fixes the issue where using the zarr_version parameter would raise a deprecation warning telling the user to use a non-existent zarr_format parameter instead. (GH10163, PR10164) By Karl Krauth.
DataTree.sel() and DataTree.isel() display the path of the first failed node again (PR10154). By Mathias Hauser.
Fix grouped and resampled first, last with datetimes (GH10169, PR10173) By Deepak Cherian.
FacetGrid plots now include units in their axis labels when available (GH10184, PR10185) By Andre Wendlinger.

v2025.03.0 (Mar 20, 2025)#

This release brings tested support for Python 3.13, support for reading Zarr V3 datasets into a DataTree, significant improvements to datetime & timedelta encoding/decoding, and improvements to the DataTree API; in addition to the usual bug fixes and other improvements. Thanks to the 26 contributors to this release: Alfonso Ladino, Benoit Bovy, Chuck Daniels, Deepak Cherian, Eni, Florian Jetter, Ian Hunt-Isaak, Jan, Joe Hamman, Josh Kihm, Julia Signell, Justus Magin, Kai Mühlbauer, Kobe Vandelanotte, Mathias Hauser, Max Jones, Maximilian Roos, Oliver Watt-Meyer, Sam Levang, Sander van Rijn, Spencer Clark, Stephan Hoyer, Tom Nicholas, Tom White, Vecko and maddogghoek

New Features#

Added tutorial.open_datatree() and tutorial.load_datatree() By Eni Awowale.
Added DataTree.filter_like() to conveniently restructure a DataTree like another DataTree (GH10096, PR10097). By Kobe Vandelanotte.
Added Coordinates.from_xindex() as convenience for creating a new Coordinates object directly from an existing Xarray index object if the latter supports it (PR10000) By Benoit Bovy.
Allow kwargs in DataTree.map_over_datasets() and map_over_datasets() (GH10009, PR10012). By Kai Mühlbauer.
support python 3.13 (no free-threading) (GH9664, PR9681) By Justus Magin.
Added experimental support for coordinate transforms (not ready for public use yet!) (PR9543) By Benoit Bovy.
Similar to our numpy.datetime64 encoding path, automatically modify the units when an integer dtype is specified during eager cftime encoding, but the specified units would not allow for an exact round trip (PR9498). By Spencer Clark.
Support reading to GPU memory with Zarr (PR10078). By Deepak Cherian.

Performance#

DatasetGroupBy.first() and DatasetGroupBy.last() can now use flox if available. (GH9647) By Deepak Cherian.

Breaking changes#

Rolled back code that would attempt to catch integer overflow when encoding times with small integer dtypes (GH8542), since it was inconsistent with xarray’s handling of standard integers, and interfered with encoding times with small integer dtypes and missing values (PR9498). By Spencer Clark.
Warn instead of raise if phony_dims are detected when using h5netcdf-backend and phony_dims=None (GH10049, PR10058) By Kai Mühlbauer.

Deprecations#

Deprecate cftime_range() in favor of date_range() with use_cftime=True (GH9886, PR10024). By Josh Kihm.
Move from phony_dims=None to phony_dims=”access” for h5netcdf-backend(GH10049, PR10058) By Kai Mühlbauer.

Bug fixes#

Fix open_datatree incompatibilities with Zarr-Python V3 and refactor TestZarrDatatreeIO accordingly (GH9960, PR10020). By Alfonso Ladino-Rincon.
Default to resolution-dependent optimal integer encoding units when saving chunked non-nanosecond numpy.datetime64 or numpy.timedelta64 arrays to disk. Previously units of “nanoseconds” were chosen by default, which are optimal for nanosecond-resolution times, but not for times with coarser resolution. By Spencer Clark (PR10017).
Use mean of min/max years as offset in calculation of datetime64 mean (GH10019, PR10035). By Kai Mühlbauer.
Fix DataArray().drop_attrs(deep=False) and add support for attrs to DataArray()._replace(). (GH10027, PR10030). By Jan Haacker.
Fix bug preventing encoding times with missing values with small integer dtype (GH9134, PR9498). By Spencer Clark.
More robustly raise an error when lazily encoding times and an integer dtype is specified with units that do not allow for an exact round trip (PR9498). By Spencer Clark.
Prevent false resolution change warnings from being emitted when decoding timedeltas encoded with floating point values, and make it clearer how to silence this warning message in the case that it is rightfully emitted (GH10071, PR10072). By Spencer Clark.
Fix isel for multi-coordinate Xarray indexes (GH10063, PR10066). By Benoit Bovy.
Fix dask tokenization when opening each node in xarray.open_datatree() (GH10098, PR10100). By Sam Levang.
Improve handling of dtype and NaT when encoding/decoding masked and packaged datetimes and timedeltas (GH8957, PR10050). By Kai Mühlbauer.

Documentation#

Better expose the Coordinates class in API reference (PR10000) By Benoit Bovy.

v2025.01.2 (Jan 31, 2025)#

This release brings non-nanosecond datetime and timedelta resolution to xarray, sharded reading in zarr, suggestion of correct names when trying to access non-existent data variables and bug fixes!

Thanks to the 16 contributors to this release: Deepak Cherian, Elliott Sales de Andrade, Jacob Prince-Bieker, Jimmy Westling, Joe Hamman, Joseph Nowak, Justus Magin, Kai Mühlbauer, Mattia Almansi, Michael Niklas, Roelof Rietbroek, Salaheddine EL FARISSI, Sam Levang, Spencer Clark, Stephan Hoyer and Tom Nicholas

In the last couple of releases xarray has been prepared for allowing non-nanosecond datetime and timedelta resolution. The code had to be changed and adapted in numerous places, affecting especially the test suite. The documentation has been updated accordingly and a new internal chapter on Time Coding has been added.

To make the transition as smooth as possible this is designed to be fully backwards compatible, keeping the current default of 'ns' resolution on decoding. To opt-into decoding to other resolutions ('us', 'ms' or 's') an instance of the newly public coders.CFDatetimeCoder class can be passed through the decode_times keyword argument (see also Default Time Unit):

coder = xr.coders.CFDatetimeCoder(time_unit="s")
ds = xr.open_dataset(filename, decode_times=coder)

Similar control of the resolution of decoded timedeltas can be achieved through passing a coders.CFTimedeltaCoder instance to the decode_timedelta keyword argument:

coder = xr.coders.CFTimedeltaCoder(time_unit="s")
ds = xr.open_dataset(filename, decode_timedelta=coder)

though by default timedeltas will be decoded to the same time_unit as datetimes.

There might slight changes when encoding/decoding times as some warning and error messages have been removed or rewritten. Xarray will now also allow non-nanosecond datetimes (with 'us', 'ms' or 's' resolution) when creating DataArray’s from scratch, picking the lowest possible resolution:

xr.DataArray(data=[np.datetime64("2000-01-01", "D")], dims=("time",))

In a future release the current default of 'ns' resolution on decoding will eventually be deprecated.

New Features#

Relax nanosecond resolution restriction in CF time coding and permit numpy.datetime64 or numpy.timedelta64 dtype arrays with "s", "ms", "us", or "ns" resolution throughout xarray (GH7493, PR9618, PR9977, PR9966, PR9999). By Kai Mühlbauer and Spencer Clark.
Enable the compute=False option in DataTree.to_zarr(). (PR9958). By Sam Levang.
Improve the error message raised when no key is matching the available variables in a dataset. (PR9943) By Jimmy Westling.
Added a time_unit argument to CFTimeIndex.to_datetimeindex(). Note that in a future version of xarray, CFTimeIndex.to_datetimeindex() will return a microsecond-resolution pandas.DatetimeIndex instead of a nanosecond-resolution pandas.DatetimeIndex (PR9965). By Spencer Clark and Kai Mühlbauer.
Adds shards to the list of valid_encodings in the zarr backend, so that sharded Zarr V3s can be written (GH9947, PR9948). By Jacob Prince_Bieker

Deprecations#

In a future version of xarray decoding of variables into numpy.timedelta64 values will be disabled by default. To silence warnings associated with this, set decode_timedelta to True, False, or a coders.CFTimedeltaCoder instance when opening data (GH1621, PR9966). By Spencer Clark.

Bug fixes#

Fix DataArray.ffill(), DataArray.bfill(), Dataset.ffill() and Dataset.bfill() when the limit is bigger than the chunksize (GH9939). By Joseph Nowak.
Fix issues related to Pandas v3 (“us” vs. “ns” for python datetime, copy on write) and handling of 0d-numpy arrays in datetime/timedelta decoding (PR9953). By Kai Mühlbauer.
Remove dask-expr from CI runs, add “pyarrow” dask dependency to windows CI runs, fix related tests (GH9962, PR9971). By Kai Mühlbauer.
Use zarr-fixture to prevent thread leakage errors (PR9967). By Kai Mühlbauer.
Fix weighted polyfit for arrays with more than two dimensions (GH9972, PR9974). By Mattia Almansi.
Preserve order of variables in xarray.combine_by_coords() (GH8828, PR9070). By Kai Mühlbauer.
Cast numpy scalars to arrays in NamedArray.from_arrays() (GH10005, PR10008) By Justus Magin.

Documentation#

A chapter on Time Coding is added to the internal section (PR9618). By Kai Mühlbauer.
Clarified xarray’s policy on API stability in the FAQ. (GH9854, PR9855) By Tom Nicholas.

Internal Changes#

Updated time coding tests to assert exact equality rather than equality with a tolerance, since xarray’s minimum supported version of cftime is greater than 1.2.1 (PR9961). By Spencer Clark.

v2025.01.1 (Jan 9, 2025)#

This is a quick release to bring compatibility with the Zarr V3 release. It also includes an update to the time decoding infrastructure as a step toward enabling non-nanosecond datetime support!

New Features#

Split out coders.CFDatetimeCoder as public API in xr.coders, make decode_times keyword argument consume coders.CFDatetimeCoder (PR9901). By Kai Mühlbauer.

Deprecations#

Time decoding related kwarg use_cftime is deprecated. Use keyword argument decode_times=CFDatetimeCoder(use_cftime=True) in open_dataset(), open_dataarray(), open_datatree(), open_groups(), open_zarr() and decode_cf() instead (PR9901). By Kai Mühlbauer.

v.2025.01.0 (Jan 3, 2025)#

This release brings much improved read performance with Zarr arrays (without consolidated metadata), better support for additional array types, as well as bugfixes and performance improvements. Thanks to the 20 contributors to this release: Bruce Merry, Davis Bennett, Deepak Cherian, Dimitri Papadopoulos Orfanos, Florian Jetter, Illviljan, Janukan Sivajeyan, Justus Magin, Kai Germaschewski, Kai Mühlbauer, Max Jones, Maximilian Roos, Michael Niklas, Patrick Peglar, Sam Levang, Scott Huberty, Spencer Clark, Stephan Hoyer, Tom Nicholas and Vecko

New Features#

Improve the error message raised when using chunked-array methods if no chunk manager is available or if the requested chunk manager is missing (PR9676) By Justus Magin. (PR9676)
Better support wrapping additional array types (e.g. cupy or jax) by calling generalized duck array operations throughout more xarray methods. (GH7848, PR9798). By Sam Levang.
Better performance for reading Zarr arrays in the ZarrStore class by caching the state of Zarr storage and avoiding redundant IO operations. By default, ZarrStore stores a snapshot of names and metadata of the in-scope Zarr arrays; this cache is then used when iterating over those Zarr arrays, which avoids IO operations and thereby reduces latency. (GH9853, PR9861). By Davis Bennett.
Add unit - keyword argument to date_range() and microsecond parsing to iso8601-parser (PR9885). By Kai Mühlbauer.

Breaking changes#

Methods including dropna, rank, idxmax, idxmin require non-dimension arguments to be passed as keyword arguments. The previous behavior, which allowed .idxmax('foo', 'all') was too easily confused with 'all' being a dimension. The updated equivalent is .idxmax('foo', how='all'). The previous behavior was deprecated in v2023.10.0. By Maximilian Roos.

Deprecations#

Finalize deprecation of closed parameters of cftime_range() and date_range() (PR9882). By Kai Mühlbauer.

Performance#

Better preservation of chunksizes in Dataset.idxmin() and Dataset.idxmax() (GH9425, PR9800). By Deepak Cherian.
Much better implementation of vectorized interpolation for dask arrays (PR9881). By Deepak Cherian.

Bug fixes#

Fix type annotations for get_axis_num. (GH9822, PR9827). By Bruce Merry.
Fix unintended load on datasets when calling DataArray.plot.scatter() (PR9818). By Jimmy Westling.
Fix interpolation when non-numeric coordinate variables are present (GH8099, GH9839). By Deepak Cherian.

Internal Changes#

Move non-CF related ensure_dtype_not_object from conventions to backends (PR9828). By Kai Mühlbauer.
Move handling of scalar datetimes into _possibly_convert_objects within as_compatible_data. This is consistent with how lists of these objects will be converted (PR9900). By Kai Mühlbauer.
Move ISO-8601 parser from coding.cftimeindex to coding.times to make it available there (prevents circular import), add capability to parse negative and/or five-digit years (PR9899). By Kai Mühlbauer.
Refactor of time coding to prepare for relaxing nanosecond restriction (PR9906). By Kai Mühlbauer.

v.2024.11.0 (Nov 22, 2024)#

This release brings better support for wrapping JAX arrays and Astropy Quantity objects, DataTree.persist(), algorithmic improvements to many methods with dask (Dataset.polyfit(), Dataset.ffill(), Dataset.bfill(), rolling reductions), and bug fixes. Thanks to the 22 contributors to this release: Benoit Bovy, Deepak Cherian, Dimitri Papadopoulos Orfanos, Holly Mandel, James Bourbeau, Joe Hamman, Justus Magin, Kai Mühlbauer, Lukas Trippe, Mathias Hauser, Maximilian Roos, Michael Niklas, Pascal Bourgault, Patrick Hoefler, Sam Levang, Sarah Charlotte Johnson, Scott Huberty, Stephan Hoyer, Tom Nicholas, Virgile Andreani, joseph nowak and tvo

New Features#

Added DataTree.persist() method (GH9675, PR9682). By Sam Levang.
Added write_inherited_coords option to DataTree.to_netcdf() and DataTree.to_zarr() (PR9677). By Stephan Hoyer.
Support lazy grouping by dask arrays, and allow specifying ordered groups with UniqueGrouper(labels=["a", "b", "c"]) (GH2852, GH757). By Deepak Cherian.
Add new automatic_rechunk kwarg to DataArrayRolling.construct() and DatasetRolling.construct(). This is only useful on dask>=2024.11.0 (GH9550). By Deepak Cherian.
Optimize ffill, bfill with dask when limit is specified (PR9771). By Joseph Nowak, and Patrick Hoefler.
Allow wrapping np.ndarray subclasses, e.g. astropy.units.Quantity (GH9704, PR9760). By Sam Levang and Tien Vo.
Optimize DataArray.polyfit() and Dataset.polyfit() with dask, when used with arrays with more than two dimensions. (GH5629). By Deepak Cherian.
Support for directly opening remote files as string paths (for example, s3://bucket/data.nc) with fsspec when using the h5netcdf engine (GH9723, PR9797). By James Bourbeau.
Re-implement the ufuncs module, which now dynamically dispatches to the underlying array’s backend. Provides better support for certain wrapped array types like jax.numpy.ndarray. (GH7848, PR9776). By Sam Levang.
Speed up loading of large zarr stores using dask arrays. (GH8902) By Deepak Cherian.

Breaking Changes#

The minimum versions of some dependencies were changed

Package

Old

New

boto3

1.28

1.29

dask-core

2023.9

2023.11

distributed

2023.9

2023.11

h5netcdf

1.2

1.3

numbagg

0.2.1

0.6

typing_extensions

4.7

4.8

Deprecations#

Grouping by a chunked array (e.g. dask or cubed) currently eagerly loads that variable in to memory. This behaviour is deprecated. If eager loading was intended, please load such arrays manually using .load() or .compute(). Else pass eagerly_compute_group=False, and provide expected group labels using the labels kwarg to a grouper object such as grouper.UniqueGrouper or grouper.BinGrouper.

Bug fixes#

Fix inadvertent deep-copying of child data in DataTree (GH9683, PR9684). By Stephan Hoyer.
Avoid including parent groups when writing DataTree subgroups to Zarr or netCDF (PR9682). By Stephan Hoyer.
Fix regression in the interoperability of DataArray.polyfit() and xr.polyval() for date-time coordinates. (PR9691). By Pascal Bourgault.
Fix CF decoding of grid_mapping to allow all possible formats, add tests (GH9761, PR9765). By Kai Mühlbauer.
Add User-Agent to request-headers when retrieving tutorial data (GH9774, PR9782) By Kai Mühlbauer.

Documentation#

Mention attribute peculiarities in docs/docstrings (GH4798, PR9700). By Kai Mühlbauer.

Internal Changes#

persist methods now route through the xr.namedarray.parallelcompat.ChunkManagerEntrypoint (PR9682). By Sam Levang.

v2024.10.0 (Oct 24th, 2024)#

This release brings official support for xarray.DataTree, and compatibility with zarr-python v3!

Aside from these two huge features, it also improves support for vectorised interpolation and fixes various bugs.

Thanks to the 31 contributors to this release: Alfonso Ladino, DWesl, Deepak Cherian, Eni, Etienne Schalk, Holly Mandel, Ilan Gold, Illviljan, Joe Hamman, Justus Magin, Kai Mühlbauer, Karl Krauth, Mark Harfouche, Martey Dodoo, Matt Savoie, Maximilian Roos, Patrick Hoefler, Peter Hill, Renat Sibgatulin, Ryan Abernathey, Spencer Clark, Stephan Hoyer, Tom Augspurger, Tom Nicholas, Vecko, Virgile Andreani, Yvonne Fröhlich, carschandler, joseph nowak, mgunyho and owenlittlejohns

New Features#

DataTree related functionality is now exposed in the main xarray public API. This includes: xarray.DataTree, xarray.open_datatree, xarray.open_groups, xarray.map_over_datasets, xarray.group_subtrees, xarray.register_datatree_accessor and xarray.testing.assert_isomorphic. By Owen Littlejohns, Eni Awowale, Matt Savoie, Stephan Hoyer, Tom Nicholas, Justus Magin, and Alfonso Ladino.
A migration guide for users of the prototype xarray-contrib/datatree repository has been added, and can be found in the DATATREE_MIGRATION_GUIDE.md file in the repository root. By Tom Nicholas.
Support for Zarr-Python 3 (GH95515, PR9552). By Tom Augspurger, Ryan Abernathey and Joe Hamman.
Added zarr backends for open_groups() (GH9430, PR9469). By Eni Awowale.
Added support for vectorized interpolation using additional interpolators from the scipy.interpolate module (GH9049, PR9526). By Holly Mandel.
Implement handling of complex numbers (netcdf4/h5netcdf) and enums (h5netcdf) (GH9246, GH3297, PR9509). By Kai Mühlbauer.
Fix passing missing arguments to when opening hdf5 and netCDF4 datatrees (GH9427, PR9428). By Alfonso Ladino.

Bug fixes#

Make illegal path-like variable names when constructing a DataTree from a Dataset (GH9339, PR9378) By Etienne Schalk.
Work around upstream pandas issue to ensure that we can decode times encoded with small integer dtype values (e.g. np.int32) in environments with NumPy 2.0 or greater without needing to fall back to cftime (PR9518). By Spencer Clark.
Fix bug when encoding times with missing values as floats in the case when the non-missing times could in theory be encoded with integers (GH9488, PR9497). By Spencer Clark.
Fix a few bugs affecting groupby reductions with flox. (GH8090, GH9398, GH9648).
Fix a few bugs affecting groupby reductions with flox. (GH8090, GH9398). By Deepak Cherian.
Fix the safe_chunks validation option on the to_zarr method (GH5511, PR9559). By Joseph Nowak.
Fix binning by multiple variables where some bins have no observations. (GH9630). By Deepak Cherian.
Fix issue where polyfit wouldn’t handle non-dimension coordinates. (GH4375, PR9369) By Karl Krauth.

Documentation#

Migrate documentation for datatree into main xarray documentation (PR9033). For information on previous datatree releases, please see: datatree’s historical release notes. By Owen Littlejohns, Matt Savoie, and Tom Nicholas.

Internal Changes#

v2024.09.0 (Sept 11, 2024)#

This release drops support for Python 3.9, and adds support for grouping by multiple arrays, while providing numerous performance improvements and bug fixes.

Thanks to the 33 contributors to this release: Alfonso Ladino, Andrew Scherer, Anurag Nayak, David Hoese, Deepak Cherian, Diogo Teles Sant’Anna, Dom, Elliott Sales de Andrade, Eni, Holly Mandel, Illviljan, Jack Kelly, Julius Busecke, Justus Magin, Kai Mühlbauer, Manish Kumar Gupta, Matt Savoie, Maximilian Roos, Michele Claus, Miguel Jimenez, Niclas Rieger, Pascal Bourgault, Philip Chmielowiec, Spencer Clark, Stephan Hoyer, Tao Xin, Tiago Sanona, TimothyCera-NOAA, Tom Nicholas, Tom White, Virgile Andreani, oliverhiggs and tiago

New Features#

Add days_in_year and decimal_year to the DatetimeAccessor on xr.DataArray. (PR9105). By Pascal Bourgault.

Performance#

Make chunk manager an option in set_options (PR9362). By Tom White.
Support for grouping by multiple variables. This is quite new, so please check your results and report bugs. Binary operations after grouping by multiple arrays are not supported yet. (GH1056, GH9332, GH324, PR9372). By Deepak Cherian.
Allow data variable specific constant_values in the dataset pad function (PR9353). By Tiago Sanona.
Speed up grouping by avoiding deep-copy of non-dimension coordinates (GH9426, PR9393) By Deepak Cherian.

Breaking changes#

Support for python 3.9 has been dropped (PR8937)

The minimum versions of some dependencies were changed

Package	Old	New
boto3	1.26	1.28
cartopy	0.21	0.22
dask-core	2023.4	2023.9
distributed	2023.4	2023.9
h5netcdf	1.1	1.2
iris	3.4	3.7
numba	0.56	0.57
numpy	1.23	1.24
pandas	2.0	2.1
scipy	1.10	1.11
typing_extensions	4.5	4.7
zarr	2.14	2.16

Bug fixes#

Fix bug with rechunking to a frequency when some periods contain no data (GH9360). By Deepak Cherian.
Fix bug causing DataTree.from_dict to be sensitive to insertion order (GH9276, PR9292). By Tom Nicholas.
Fix resampling error with monthly, quarterly, or yearly frequencies with cftime when the time bins straddle the date “0001-01-01”. For example, this can happen in certain circumstances when the time coordinate contains the date “0001-01-01”. (GH9108, PR9116) By Spencer Clark and Deepak Cherian.
Fix issue with passing parameters to ZarrStore.open_store when opening datatree in zarr format (GH9376, PR9377). By Alfonso Ladino
Fix deprecation warning that was raised when calling np.array on an xr.DataArray in NumPy 2.0 (GH9312, PR9393) By Andrew Scherer.
Fix support for using pandas.DateOffset, pandas.Timedelta, and datetime.timedelta objects as resample frequencies (GH9408, PR9413). By Oliver Higgs.

Internal Changes#

Re-enable testing pydap backend with numpy>=2 (PR9391). By Miguel Jimenez .

v2024.07.0 (Jul 30, 2024)#

This release extends the API for groupby operations with various grouper objects, and includes improvements to the documentation and numerous bugfixes.

Thanks to the 22 contributors to this release: Alfonso Ladino, ChrisCleaner, David Hoese, Deepak Cherian, Dieter Werthmüller, Illviljan, Jessica Scheick, Joel Jaeschke, Justus Magin, K. Arthur Endsley, Kai Mühlbauer, Mark Harfouche, Martin Raspaud, Mathijs Verhaegh, Maximilian Roos, Michael Niklas, Michał Górny, Moritz Schreiber, Pontus Lurcock, Spencer Clark, Stephan Hoyer and Tom Nicholas

New Features#

Use fastpath when grouping both montonically increasing and decreasing variable in GroupBy (GH6220, PR7427). By Joel Jaeschke.
Introduce new groupers.UniqueGrouper, groupers.BinGrouper, and groupers.TimeResampler objects as a step towards supporting grouping by multiple variables. See the docs and the grouper design doc for more. (GH6610, PR8840). By Deepak Cherian.
Allow rechunking to a frequency using Dataset.chunk(time=TimeResampler("YE")) syntax. (GH7559, PR9109) Such rechunking allows many time domain analyses to be executed in an embarrassingly parallel fashion. By Deepak Cherian.
Allow per-variable specification of `mask_and_scale, decode_times, decode_timedelta use_cftime and concat_characters params in open_dataset() (PR9218). By Mathijs Verhaegh.
Allow chunking for arrays with duplicated dimension names (GH8759, PR9099). By Martin Raspaud.
Extract the source url from fsspec objects (GH9142, PR8923). By Justus Magin.
Add DataArray.drop_attrs() & Dataset.drop_attrs() methods, to return an object without attrs. A deep parameter controls whether variables’ attrs are also dropped. By Maximilian Roos. (PR8288)
Added open_groups() for h5netcdf and netCDF4 backends (GH9137, PR9243). By Eni Awowale.

Breaking changes#

The base and loffset parameters to Dataset.resample() and DataArray.resample() are now removed. These parameters have been deprecated since v2023.03.0. Using the origin or offset parameters is recommended as a replacement for using the base parameter and using time offset arithmetic is recommended as a replacement for using the loffset parameter. (PR9233) By Deepak Cherian.
The squeeze kwarg to groupby is now ignored. This has been the source of some quite confusing behaviour and has been deprecated since v2024.01.0. groupby behavior is now always consistent with the existing .groupby(..., squeeze=False) behavior. No errors will be raised if squeeze=False. (PR9280) By Deepak Cherian.

Bug fixes#

Fix scatter plot broadcasting unnecessarily. (GH9129, PR9206) By Jimmy Westling.
Don’t convert custom indexes to pandas indexes when computing a diff (PR9157) By Justus Magin.
Make testing.assert_allclose() work with numpy 2.0 (GH9165, PR9166). By Pontus Lurcock.
Allow diffing objects with array attributes on variables (GH9153, PR9169). By Justus Magin.
numpy>=2 compatibility in the netcdf4 backend (PR9136). By Justus Magin and Kai Mühlbauer.
Promote floating-point numeric datetimes before decoding (GH9179, PR9182). By Justus Magin.
Address regression introduced in PR9002 that prevented objects returned by DataArray.convert_calendar() to be indexed by a time index in certain circumstances (GH9138, PR9192). By Mark Harfouche and Spencer Clark.
Fix static typing of tolerance arguments by allowing str type (GH8892, PR9194). By Michael Niklas.
Dark themes are now properly detected for html[data-theme=dark]-tags (PR9200). By Dieter Werthmüller.
Reductions no longer fail for np.complex_ dtype arrays when numbagg is installed. (PR9210) By Maximilian Roos.

Documentation#

Adds intro to backend section of docs, including a flow-chart to navigate types of backends (PR9175). By Jessica Scheick.
Adds a flow-chart diagram to help users navigate help resources (D8990, PR9147). By Jessica Scheick.
Improvements to Zarr & chunking docs (PR9139, PR9140, PR9132) By Maximilian Roos.
Fix copybutton for multi line examples and double digit ipython cell numbers (PR9264). By Moritz Schreiber.

Internal Changes#

Enable typing checks of pandas (PR9213). By Michael Niklas.

v2024.06.0 (Jun 13, 2024)#

This release brings various performance optimizations and compatibility with the upcoming numpy 2.0 release.

Thanks to the 22 contributors to this release: Alfonso Ladino, David Hoese, Deepak Cherian, Eni Awowale, Ilan Gold, Jessica Scheick, Joe Hamman, Justus Magin, Kai Mühlbauer, Mark Harfouche, Mathias Hauser, Matt Savoie, Maximilian Roos, Mike Thramann, Nicolas Karasiak, Owen Littlejohns, Paul Ockenfuß, Philippe THOMY, Scott Henderson, Spencer Clark, Stephan Hoyer and Tom Nicholas

Performance#

Small optimization to the netCDF4 and h5netcdf backends (GH9058, PR9067). By Deepak Cherian.
Small optimizations to help reduce indexing speed of datasets (PR9002). By Mark Harfouche.
Performance improvement in open_datatree method for Zarr, netCDF4 and h5netcdf backends (GH8994, PR9014). By Alfonso Ladino.

Bug fixes#

Preserve conversion of timezone-aware pandas Datetime arrays to numpy object arrays (GH9026, PR9042). By Ilan Gold.
DataArrayResample.interpolate() and DatasetResample.interpolate() method now support arbitrary kwargs such as order for polynomial interpolation (GH8762). By Nicolas Karasiak.

Documentation#

Add link to CF Conventions on packed data and sentence on type determination in the I/O user guide (GH9041, PR9045). By Kai Mühlbauer.

Internal Changes#

Migrates remainder of io.py to xarray/core/datatree_io.py and TreeAttrAccessMixin into xarray/core/common.py (PR9011). By Owen Littlejohns and Tom Nicholas.
Compatibility with numpy 2 (GH8844, PR8854, PR8946). By Justus Magin and Stephan Hoyer.

v2024.05.0 (May 12, 2024)#

This release brings support for pandas ExtensionArray objects, optimizations when reading Zarr, the ability to concatenate datasets without pandas indexes, more compatibility fixes for the upcoming numpy 2.0, and the migration of most of the xarray-datatree project code into xarray main!

Thanks to the 18 contributors to this release: Aimilios Tsouvelekakis, Andrey Akinshin, Deepak Cherian, Eni Awowale, Ilan Gold, Illviljan, Justus Magin, Mark Harfouche, Matt Savoie, Maximilian Roos, Noah C. Benson, Pascal Bourgault, Ray Bell, Spencer Clark, Tom Nicholas, ignamv, owenlittlejohns, and saschahofmann.

New Features#

New “random” method for converting to and from 360_day calendars (PR8603). By Pascal Bourgault.
Xarray now makes a best attempt not to coerce pandas.api.extensions.ExtensionArray to a numpy array by supporting 1D ExtensionArray objects internally where possible. Thus, Dataset objects initialized with a pd.Categorical, for example, will retain the object. However, one cannot do operations that are not possible on the ExtensionArray then, such as broadcasting. (GH5287, GH8463, PR8723) By Ilan Gold.
testing.assert_allclose() / testing.assert_equal() now accept a new argument check_dims="transpose", controlling whether a transposed array is considered equal. (GH5733, PR8991) By Ignacio Martinez Vazquez.
Added the option to avoid automatically creating 1D pandas indexes in Dataset.expand_dims(), by passing the new kwarg create_index_for_new_dim=False. (PR8960) By Tom Nicholas.
Avoid automatically re-creating 1D pandas indexes in concat(). Also added option to avoid creating 1D indexes for new dimension coordinates by passing the new kwarg create_index_for_new_dim=False. (GH8871, PR8872) By Tom Nicholas.

Breaking changes#

The PyNIO backend has been deleted (GH4491, PR7301). By Deepak Cherian.

The minimum versions of some dependencies were changed, in particular our minimum supported pandas version is now Pandas 2.

Package	Old	New
dask-core	2022.12	2023.4
distributed	2022.12	2023.4
h5py	3.7	3.8
matplotlib-base	3.6	3.7
packaging	22.0	23.1
pandas	1.5	2.0
pydap	3.3	3.4
sparse	0.13	0.14
typing_extensions	4.4	4.5
zarr	2.13	2.14

Bug fixes#

Following an upstream bug fix to pandas.date_range(), date ranges produced by xarray.cftime_range() with negative frequencies will now fall fully within the bounds of the provided start and end dates (PR8999). By Spencer Clark.

Internal Changes#

Enforces failures on CI when tests raise warnings from within xarray (PR8974) By Maximilian Roos
Migrates formatting_html functionality for DataTree into xarray/core (PR8930) By Eni Awowale, Julia Signell and Tom Nicholas.
Migrates datatree_mapping functionality into xarray/core (PR8948) By Matt Savoie Owen Littlejohns and Tom Nicholas.
Migrates extensions, formatting and datatree_render functionality for DataTree into xarray/core. Also migrates testing functionality into xarray/testing/assertions for DataTree. (PR8967) By Owen Littlejohns and Tom Nicholas.
Migrates ops.py functionality into xarray/core/datatree_ops.py (PR8976) By Matt Savoie and Tom Nicholas.
Migrates iterator functionality into xarray/core (PR8879) By Owen Littlejohns, Matt Savoie and Tom Nicholas.
transpose, set_dims, stack & unstack now use a dim kwarg rather than dims or dimensions. This is the final change to make xarray methods consistent with their use of dim. Using the existing kwarg will raise a warning. By Maximilian Roos

v2024.03.0 (Mar 29, 2024)#

This release brings performance improvements for grouped and resampled quantile calculations, CF decoding improvements, minor optimizations to distributed Zarr writes, and compatibility fixes for Numpy 2.0 and Pandas 3.0.

Thanks to the 18 contributors to this release: Anderson Banihirwe, Christoph Hasse, Deepak Cherian, Etienne Schalk, Justus Magin, Kai Mühlbauer, Kevin Schwarzwald, Mark Harfouche, Martin, Matt Savoie, Maximilian Roos, Ray Bell, Roberto Chang, Spencer Clark, Tom Nicholas, crusaderky, owenlittlejohns, saschahofmann

New Features#

Partial writes to existing chunks with region or append_dim will now raise an error (unless safe_chunks=False); previously an error would only be raised on new variables. (PR8459, GH8371, GH8882) By Maximilian Roos.
Grouped and resampling quantile calculations now use the vectorized algorithm in flox>=0.9.4 if present. By Deepak Cherian.
Do not broadcast in arithmetic operations when global option arithmetic_broadcast=False (GH6806, PR8784). By Etienne Schalk and Deepak Cherian.
Add the .oindex property to Explicitly Indexed Arrays for orthogonal indexing functionality. (GH8238, PR8750) By Anderson Banihirwe.
Add the .vindex property to Explicitly Indexed Arrays for vectorized indexing functionality. (GH8238, PR8780) By Anderson Banihirwe.
Expand use of .oindex and .vindex properties. (PR8790) By Anderson Banihirwe and Deepak Cherian.
Allow creating xr.Coordinates objects with no indexes (PR8711) By Benoit Bovy and Tom Nicholas.
Enable plotting of datetime.dates. (GH8866, PR8873) By Sascha Hofmann.

Breaking changes#

Don’t allow overwriting index variables with to_zarr region writes. (GH8589, PR8876). By Deepak Cherian.

Bug fixes#

The default freq parameter in xr.date_range() and xr.cftime_range() is set to 'D' only if periods, start, or end are None (GH8770, PR8774). By Roberto Chang.
Ensure that non-nanosecond precision numpy.datetime64 and numpy.timedelta64 values are cast to nanosecond precision values when used in DataArray.expand_dims() and :Dataset.expand_dims() (PR8781). By Spencer Clark.
CF conform handling of _FillValue/missing_value and dtype in CFMaskCoder/CFScaleOffsetCoder (GH2304, GH5597, GH7691, PR8713, see also discussion in PR7654). By Kai Mühlbauer.
Do not cast _FillValue/missing_value in CFMaskCoder if _Unsigned is provided (GH8844, PR8852).
Adapt handling of copy keyword argument for numpy >= 2.0dev (GH8844, PR8851, PR8865). By Kai Mühlbauer.
Import trapz/trapezoid depending on numpy version (GH8844, PR8865). By Kai Mühlbauer.
Warn and return bytes undecoded in case of UnicodeDecodeError in h5netcdf-backend (GH5563, PR8874). By Kai Mühlbauer.
Fix bug incorrectly disallowing creation of a dataset with a multidimensional coordinate variable with the same name as one of its dims. (GH8884, PR8886) By Tom Nicholas.

Internal Changes#

Migrates treenode functionality into xarray/core (PR8757) By Matt Savoie and Tom Nicholas.
Migrates datatree functionality into xarray/core. (PR8789) By Owen Littlejohns, Matt Savoie and Tom Nicholas.

v2024.02.0 (Feb 19, 2024)#

This release brings size information to the text repr, changes to the accepted frequency strings, and various bug fixes.

Thanks to our 12 contributors:

Anderson Banihirwe, Deepak Cherian, Eivind Jahren, Etienne Schalk, Justus Magin, Marco Wolsza, Mathias Hauser, Matt Savoie, Maximilian Roos, Rambaud Pierrick, Tom Nicholas

New Features#

Added a simple nbytes representation in DataArrays and Dataset repr. (GH8690, PR8702). By Etienne Schalk.
Allow negative frequency strings (e.g. "-1YE"). These strings are for example used in date_range(), and cftime_range() (PR8651). By Mathias Hauser.
Add NamedArray.expand_dims(), NamedArray.permute_dims() and NamedArray.broadcast_to() (PR8380) By Anderson Banihirwe.
Xarray now defers to flox’s heuristics to set the default method for groupby problems. This only applies to flox>=0.9. By Deepak Cherian.
All quantile methods (e.g. DataArray.quantile()) now use numbagg for the calculation of nanquantiles (i.e., skipna=True) if it is installed. This is currently limited to the linear interpolation method (method=’linear’). (GH7377, PR8684) By Marco Wolsza.

Breaking changes#

infer_freq() always returns the frequency strings as defined in pandas 2.2 (GH8612, PR8627). By Mathias Hauser.

Deprecations#

The dt.weekday_name parameter wasn’t functional on modern pandas versions and has been removed. (GH8610, PR8664) By Sam Coleman.

Bug fixes#

Fixed a regression that prevented multi-index level coordinates being serialized after resetting or dropping the multi-index (GH8628, PR8672). By Benoit Bovy.
Fix bug with broadcasting when wrapping array API-compliant classes. (GH8665, PR8669) By Tom Nicholas.
Ensure DataArray.unstack() works when wrapping array API-compliant classes. (GH8666, PR8668) By Tom Nicholas.
Fix negative slicing of Zarr arrays without dask installed. (GH8252) By Deepak Cherian.
Preserve chunks when writing time-like variables to zarr by enabling lazy CF encoding of time-like variables (GH7132, GH8230, GH8432, PR8575). By Spencer Clark and Mattia Almansi.
Preserve chunks when writing time-like variables to zarr by enabling their lazy encoding (GH7132, GH8230, GH8432, PR8253, PR8575; see also discussion in PR8253). By Spencer Clark and Mattia Almansi.
Raise an informative error if dtype encoding of time-like variables would lead to integer overflow or unsafe conversion from floating point to integer values (GH8542, PR8575). By Spencer Clark.
Raise an error when unstacking a MultiIndex that has duplicates as this would lead to silent data loss (GH7104, PR8737). By Mathias Hauser.

Documentation#

Fix variables arg typo in Dataset.sortby() docstring (GH8663, PR8670) By Tom Vo.
Fixed documentation where the use of the depreciated pandas frequency string prevented the documentation from being built. (PR8638) By Sam Coleman.

Internal Changes#

DataArray.dt now raises an AttributeError rather than a TypeError when the data isn’t datetime-like. (GH8718, PR8724) By Maximilian Roos.
Move parallelcompat and chunk managers modules from xarray/core to xarray/namedarray. (PR8319) By Tom Nicholas and Anderson Banihirwe.
Imports datatree repository and history into internal location. (PR8688) By Matt Savoie, Justus Magin and Tom Nicholas.
Adds open_datatree() into xarray/backends (PR8697) By Matt Savoie and Tom Nicholas.
Refactor xarray.core.indexing.DaskIndexingAdapter.__getitem__() to remove an unnecessary rewrite of the indexer key (GH8377, PR8758) By Anderson Banihirwe.

v2024.01.1 (23 Jan, 2024)#

This release is to fix a bug with the rendering of the documentation, but it also includes changes to the handling of pandas frequency strings.

Breaking changes#

Following pandas, infer_freq() will return "YE", instead of "Y" (formerly "A"). This is to be consistent with the deprecation of the latter frequency string in pandas 2.2. This is a follow up to PR8415 (GH8612, PR8642). By Mathias Hauser.

Deprecations#

Following pandas, the frequency string "Y" (formerly "A") is deprecated in favor of "YE". These strings are used, for example, in date_range(), cftime_range(), DataArray.resample(), and Dataset.resample() among others (GH8612, PR8629). By Mathias Hauser.

Documentation#

Pin sphinx-book-theme to 1.0.1 to fix a rendering issue with the sidebar in the docs. (GH8619, PR8632) By Tom Nicholas.

v2024.01.0 (17 Jan, 2024)#

This release brings support for weights in correlation and covariance functions, a new DataArray.cumulative aggregation, improvements to xr.map_blocks, an update to our minimum dependencies, and various bugfixes.

Thanks to our 17 contributors to this release:

Abel Aoun, Deepak Cherian, Illviljan, Johan Mathe, Justus Magin, Kai Mühlbauer, Llorenç Lledó, Mark Harfouche, Markel, Mathias Hauser, Maximilian Roos, Michael Niklas, Niclas Rieger, Sébastien Celles, Tom Nicholas, Trinh Quoc Anh, and crusaderky.

New Features#

xr.cov() and xr.corr() now support using weights (GH8527, PR7392). By Llorenç Lledó.
Accept the compression arguments new in netCDF 1.6.0 in the netCDF4 backend. See netCDF4 documentation for details. Note that some new compression filters needs plugins to be installed which may not be available in all netCDF distributions. By Markel García-Díez. (GH6929, PR7551)
Add DataArray.cumulative() & Dataset.cumulative() to compute cumulative aggregations, such as sum, along a dimension — for example da.cumulative('time').sum(). This is similar to pandas’ .expanding, and mostly equivalent to .cumsum methods, or to DataArray.rolling() with a window length equal to the dimension size. By Maximilian Roos. (PR8512)
Decode/Encode netCDF4 enums and store the enum definition in dataarrays’ dtype metadata. If multiple variables share the same enum in netCDF4, each dataarray will have its own enum definition in their respective dtype metadata. By Abel Aoun. (GH8144, PR8147)

Breaking changes#

The minimum versions of some dependencies were changed (PR8586):

Package	Old	New
cartopy	0.20	0.21
dask-core	2022.7	2022.12
distributed	2022.7	2022.12
flox	0.5	0.7
iris	3.2	3.4
matplotlib-base	3.5	3.6
numpy	1.22	1.23
numba	0.55	0.56
packaging	21.3	22.0
seaborn	0.11	0.12
scipy	1.8	1.10
typing_extensions	4.3	4.4
zarr	2.12	2.13

Deprecations#

The squeeze kwarg to GroupBy is now deprecated. (GH2157, PR8507) By Deepak Cherian.

Bug fixes#

Support non-string hashable dimensions in xarray.DataArray (GH8546, PR8559). By Michael Niklas.
Reverse index output of bottleneck’s rolling move_argmax/move_argmin functions (GH8541, PR8552). By Kai Mühlbauer.
Vendor SerializableLock from dask and use as default lock for netcdf4 backends (GH8442, PR8571). By Kai Mühlbauer.
Add tests and fixes for empty CFTimeIndex, including broken html repr (GH7298, PR8600). By Mathias Hauser.

Internal Changes#

The implementation of map_blocks() has changed to minimize graph size and duplication of data. This should be a strict improvement even though the graphs are not always embarrassingly parallel any more. Please open an issue if you spot a regression. (PR8412, GH8409). By Deepak Cherian.
Remove null values before plotting. (PR8535). By Jimmy Westling.
Redirect cumulative reduction functions internally through the ChunkManagerEntryPoint, potentially allowing ffill() and bfill() to use non-dask chunked array types. (PR8019) By Tom Nicholas.

v2023.12.0 (2023 Dec 08)#

This release brings new hypothesis strategies for testing, significantly faster rolling aggregations as well as ffill and bfill with numbagg, a new Dataset.eval() method, and improvements to reading and writing Zarr arrays (including a new "a-" mode).

Thanks to our 16 contributors:

Anderson Banihirwe, Ben Mares, Carl Andersson, Deepak Cherian, Doug Latornell, Gregorio L. Trevisan, Illviljan, Jens Hedegaard Nielsen, Justus Magin, Mathias Hauser, Max Jones, Maximilian Roos, Michael Niklas, Patrick Hoefler, Ryan Abernathey, Tom Nicholas

New Features#

Added hypothesis strategies for generating xarray.Variable objects containing arbitrary data, useful for parametrizing downstream tests. Accessible under testing.strategies, and documented in a new page on testing in the User Guide. (GH6911, PR8404) By Tom Nicholas.
rolling() uses numbagg for most of its computations by default. Numbagg is up to 5x faster than bottleneck where parallelization is possible. Where parallelization isn’t possible — for example a 1D array — it’s about the same speed as bottleneck, and 2-5x faster than pandas’ default functions. (PR8493). numbagg is an optional dependency, so requires installing separately.
Use a concise format when plotting datetime arrays. (PR8449). By Jimmy Westling.
Avoid overwriting unchanged existing coordinate variables when appending with Dataset.to_zarr() by setting mode='a-'. By Ryan Abernathey and Deepak Cherian.
rank() now operates on dask-backed arrays, assuming the core dim has exactly one chunk. (PR8475). By Maximilian Roos.
Add a Dataset.eval() method, similar to the pandas’ method of the same name. (PR7163). This is currently marked as experimental and doesn’t yet support the numexpr engine.
Dataset.drop_vars() & DataArray.drop_vars() allow passing a callable, similar to Dataset.where() & Dataset.sortby() & others. (PR8511). By Maximilian Roos.

Breaking changes#

Explicitly warn when creating xarray objects with repeated dimension names. Such objects will also now raise when DataArray.get_axis_num() is called, which means many functions will raise. This latter change is technically a breaking change, but whilst allowed, this behaviour was never actually supported! (GH3731, PR8491) By Tom Nicholas.

Deprecations#

As part of an effort to standardize the API, we’re renaming the dims keyword arg to dim for the minority of functions which current use dims. This started with xarray.dot() & DataArray.dot() and we’ll gradually roll this out across all functions. The warnings are currently PendingDeprecationWarning, which are silenced by default. We’ll convert these to DeprecationWarning in a future release. By Maximilian Roos.
Raise a FutureWarning warning that the type of Dataset.dims() will be changed from a mapping of dimension names to lengths to a set of dimension names. This is to increase consistency with DataArray.dims(). To access a mapping of dimension names to lengths please use Dataset.sizes(). The same change also applies to DatasetGroupBy.dims. (GH8496, PR8500) By Tom Nicholas.
Dataset.drop() & DataArray.drop() are now deprecated, since pending deprecation for several years. DataArray.drop_sel() & DataArray.drop_var() replace them for labels & variables respectively. (PR8497) By Maximilian Roos.

Bug fixes#

Fix dtype inference for pd.CategoricalIndex when categories are backed by a pd.ExtensionDtype (PR8481)
Fix writing a variable that requires transposing when not writing to a region (PR8484) By Maximilian Roos.
Static typing of p0 and bounds arguments of xarray.DataArray.curvefit() and xarray.Dataset.curvefit() was changed to Mapping (PR8502). By Michael Niklas.
Fix typing of xarray.DataArray.to_netcdf() and xarray.Dataset.to_netcdf() when compute is evaluated to bool instead of a Literal (PR8268). By Jens Hedegaard Nielsen.

Documentation#

Added illustration of updating the time coordinate values of a resampled dataset using time offset arithmetic. This is the recommended technique to replace the use of the deprecated loffset parameter in resample (PR8479). By Doug Latornell.
Improved error message when attempting to get a variable which doesn’t exist from a Dataset. (PR8474) By Maximilian Roos.
Fix default value of combine_attrs in xarray.combine_by_coords() (PR8471) By Gregorio L. Trevisan.

Internal Changes#

DataArray.bfill() & DataArray.ffill() now use numbagg <numbagg/numbagg>`_ by default, which is up to 5x faster where parallelization is possible. (PR8339) By Maximilian Roos.
Update mypy version to 1.7 (GH8448, PR8501). By Michael Niklas.

v2023.11.0 (Nov 16, 2023)#

Tip

This is our 10th year anniversary release! Thank you for your love and support.

This release brings the ability to use opt_einsum for xarray.dot() by default, support for auto-detecting region when writing partial datasets to Zarr, and the use of h5py drivers with h5netcdf.

Thanks to the 19 contributors to this release: Aman Bagrecha, Anderson Banihirwe, Ben Mares, Deepak Cherian, Dimitri Papadopoulos Orfanos, Ezequiel Cimadevilla Alvarez, Illviljan, Justus Magin, Katelyn FitzGerald, Kai Muehlbauer, Martin Durant, Maximilian Roos, Metamess, Sam Levang, Spencer Clark, Tom Nicholas, mgunyho, templiert

New Features#

Use opt_einsum for xarray.dot() by default if installed. By Deepak Cherian. (GH7764, PR8373).
Add DataArray.dt.total_seconds() method to match the Pandas API. (PR8435). By Ben Mares.
Allow passing region="auto" in Dataset.to_zarr() to automatically infer the region to write in the original store. Also implement automatic transpose when dimension order does not match the original store. (GH7702, GH8421, PR8434). By Sam Levang.
Allow the usage of h5py drivers (eg: ros3) via h5netcdf (PR8360). By Ezequiel Cimadevilla.
Enable VLEN string fill_values, preserve VLEN string dtypes (GH1647, GH7652, GH7868, PR7869). By Kai Mühlbauer.

Breaking changes#

drop support for cdms2. Please use xcdat instead (PR8441). By Justus Magin.
Following pandas, infer_freq() will return "Y", "YS", "QE", "ME", "h", "min", "s", "ms", "us", or "ns" instead of "A", "AS", "Q", "M", "H", "T", "S", "L", "U", or "N". This is to be consistent with the deprecation of the latter frequency strings (GH8394, PR8415). By Spencer Clark.
Bump minimum tested pint version to >=0.22. By Deepak Cherian.
Minimum supported versions for the following packages have changed: h5py >=3.7, h5netcdf>=1.1. By Kai Mühlbauer.

Deprecations#

The PseudoNetCDF backend has been removed. By Deepak Cherian.
Supplying dimension-ordered sequences to DataArray.chunk() & Dataset.chunk() is deprecated in favor of supplying a dictionary of dimensions, or a single int or "auto" argument covering all dimensions. Xarray favors using dimensions names rather than positions, and this was one place in the API where dimension positions were used. (PR8341) By Maximilian Roos.
Following pandas, the frequency strings "A", "AS", "Q", "M", "H", "T", "S", "L", "U", and "N" are deprecated in favor of "Y", "YS", "QE", "ME", "h", "min", "s", "ms", "us", and "ns", respectively. These strings are used, for example, in date_range(), cftime_range(), DataArray.resample(), and Dataset.resample() among others (GH8394, PR8415). By Spencer Clark.
Rename Dataset.to_array() to Dataset.to_dataarray() for consistency with DataArray.to_dataset() & open_dataarray() functions. This is a “soft” deprecation — the existing methods work and don’t raise any warnings, given the relatively small benefits of the change. By Maximilian Roos.
Finally remove keep_attrs kwarg from DataArray.resample() and Dataset.resample(). These were deprecated a long time ago. By Deepak Cherian.

Bug fixes#

Port bug fix from pandas to eliminate the adjustment of resample bin edges in the case that the resampling frequency has units of days and is greater than one day (e.g. "2D", "3D" etc.) and the closed argument is set to "right" to xarray’s implementation of resample for data indexed by a CFTimeIndex (PR8393). By Spencer Clark.
Fix to once again support date offset strings as input to the loffset parameter of resample and test this functionality (PR8422, GH8399). By Katelyn FitzGerald.
Fix a bug where DataArray.to_dataset() silently drops a variable if a coordinate with the same name already exists (PR8433, GH7823). By András Gunyhó.
Fix for DataArray.to_zarr() & Dataset.to_zarr() to close the created zarr store when passing a path with .zip extension (PR8425). By Carl Andersson.

Documentation#

Small updates to documentation on distributed writes: See Modifying existing Zarr stores to Zarr. By Deepak Cherian.

v2023.10.1 (19 Oct, 2023)#

This release updates our minimum numpy version in pyproject.toml to 1.22, consistent with our documentation below.

v2023.10.0 (19 Oct, 2023)#

This release brings performance enhancements to reading Zarr datasets, the ability to use numbagg for reductions, an expansion in API for rolling_exp, fixes two regressions with datetime decoding, and many other bugfixes and improvements. Groupby reductions will also use numbagg if flox>=0.8.1 and numbagg are both installed.

Thanks to our 13 contributors: Anderson Banihirwe, Bart Schilperoort, Deepak Cherian, Illviljan, Kai Mühlbauer, Mathias Hauser, Maximilian Roos, Michael Niklas, Pieter Eendebak, Simon Høxbro Hansen, Spencer Clark, Tom White, olimcc

New Features#

Support high-performance reductions with numbagg. This is enabled by default if numbagg is installed. By Deepak Cherian. (PR8316)
Add corr, cov, std & var to .rolling_exp. By Maximilian Roos. (PR8307)
DataArray.where() & Dataset.where() accept a callable for the other parameter, passing the object as the only argument. Previously, this was only valid for the cond parameter. (GH8255) By Maximilian Roos.
.rolling_exp functions can now take a min_weight parameter, to only output values when there are sufficient recent non-nan values. numbagg>=0.3.1 is required. (PR8285) By Maximilian Roos.
DataArray.sortby() & Dataset.sortby() accept a callable for the variables parameter, passing the object as the only argument. By Maximilian Roos.
.rolling_exp functions can now operate on dask-backed arrays, assuming the core dim has exactly one chunk. (PR8284). By Maximilian Roos.

Breaking changes#

Made more arguments keyword-only (e.g. keep_attrs, skipna) for many xarray.DataArray and xarray.Dataset methods (PR6403). By Mathias Hauser.
Dataset.to_zarr() & DataArray.to_zarr() require keyword arguments after the initial 7 positional arguments. By Maximilian Roos.

Deprecations#

Rename Dataset.reset_encoding() & DataArray.reset_encoding() to Dataset.drop_encoding() & DataArray.drop_encoding() for consistency with other drop & reset methods — drop generally removes something, while reset generally resets to some default or standard value. (PR8287, GH8259) By Maximilian Roos.

Bug fixes#

DataArray.rename() & Dataset.rename() would emit a warning when the operation was a no-op. (GH8266) By Simon Hansen.
Fixed a regression introduced in the previous release checking time-like units when encoding/decoding masked data (GH8269, PR8277). By Kai Mühlbauer.
Fix datetime encoding precision loss regression introduced in the previous release for datetimes encoded with units requiring floating point values, and a reference date not equal to the first value of the datetime array (GH8271, PR8272). By Spencer Clark.
Fix excess metadata requests when using a Zarr store. Prior to this, metadata was re-read every time data was retrieved from the array, now metadata is retrieved only once when they array is initialized. (GH8290, PR8297). By Oliver McCormack.
Fix to_zarr ending in a ReadOnlyError when consolidated metadata was used and the write_empty_chunks was provided. (GH8323, PR8326) By Matthijs Amesz.

Documentation#

Added page on the interoperability of xarray objects. (PR7992) By Tom Nicholas.
Added xarray-regrid to the list of xarray related projects (PR8272). By Bart Schilperoort.

Internal Changes#

More improvements to support the Python array API standard by using duck array ops in more places in the codebase. (PR8267) By Tom White.

v2023.09.0 (Sep 26, 2023)#

This release continues work on the new xarray.Coordinates object, allows to provide preferred_chunks when reading from netcdf files, enables xarray.apply_ufunc() to handle missing core dimensions and fixes several bugs.

Thanks to the 24 contributors to this release: Alexander Fischer, Amrest Chinkamol, Benoit Bovy, Darsh Ranjan, Deepak Cherian, Gianfranco Costamagna, Gregorio L. Trevisan, Illviljan, Joe Hamman, JR, Justus Magin, Kai Mühlbauer, Kian-Meng Ang, Kyle Sunden, Martin Raspaud, Mathias Hauser, Mattia Almansi, Maximilian Roos, András Gunyhó, Michael Niklas, Richard Kleijn, Riulinchen, Tom Nicholas and Wiktor Kraśnicki.

We welcome the following new contributors to Xarray!: Alexander Fischer, Amrest Chinkamol, Darsh Ranjan, Gianfranco Costamagna, Gregorio L. Trevisan, Kian-Meng Ang, Riulinchen and Wiktor Kraśnicki.

New Features#

Added the Coordinates.assign() method that can be used to combine different collections of coordinates prior to assign them to a Dataset or DataArray (PR8102) at once. By Benoît Bovy.
Provide preferred_chunks for data read from netcdf files (GH1440, PR7948). By Martin Raspaud.
Added on_missing_core_dims to apply_ufunc() to allow for copying or dropping a Dataset’s variables with missing core dimensions (PR8138). By Maximilian Roos.

Breaking changes#

The Coordinates constructor now creates a (pandas) index by default for each dimension coordinate. To keep the previous behavior (no index created), pass an empty dictionary to indexes. The constructor now also extracts and add the indexes from another Coordinates object passed via coords (PR8107). By Benoît Bovy.
Static typing of xlim and ylim arguments in plotting functions now must be tuple[float, float] to align with matplotlib requirements. (GH7802, PR8030). By Michael Niklas.

Deprecations#

Deprecate passing a pandas.MultiIndex object directly to the Dataset and DataArray constructors as well as to Dataset.assign() and Dataset.assign_coords(). A new Xarray Coordinates object has to be created first using Coordinates.from_pandas_multiindex() (PR8094). By Benoît Bovy.

Bug fixes#

Improved static typing of reduction methods (PR6746). By Richard Kleijn.
Fix bug where empty attrs would generate inconsistent tokens (GH6970, PR8101). By Mattia Almansi.
Improved handling of multi-coordinate indexes when updating coordinates, including bug fixes (and improved warnings for deprecated features) for pandas multi-indexes (PR8094). By Benoît Bovy.
Fixed a bug in merge() with compat='minimal' where the coordinate names were not updated properly internally (GH7405, GH7588, PR8104). By Benoît Bovy.
Fix bug where DataArray instances on the right-hand side of DataArray.__setitem__() lose dimension names (GH7030, PR8067). By Darsh Ranjan.
Return float64 in presence of NaT in DatetimeAccessor and special case NaT handling in isocalendar() (GH7928, PR8084). By Kai Mühlbauer.
Fix construct() with stride on Datasets without indexes. (GH7021, PR7578). By Amrest Chinkamol and Michael Niklas.
Calling plot with kwargs col, row or hue no longer squeezes dimensions passed via these arguments (GH7552, PR8174). By Wiktor Kraśnicki.
Fixed a bug where casting from float to int64 (undefined for NaN) led to varying issues (GH7817, GH7942, GH7790, GH6191, GH7096, GH1064, PR7827). By Kai Mühlbauer.
Fixed a bug where inaccurate coordinates silently failed to decode variable (GH1809, PR8195). By Kai Mühlbauer
.rolling_exp functions no longer mistakenly lose non-dimensioned coords (GH6528, PR8114). By Maximilian Roos.
In the event that user-provided datetime64/timedelta64 units and integer dtype encoding parameters conflict with each other, override the units to preserve an integer dtype for most faithful serialization to disk (GH1064, PR8201). By Kai Mühlbauer.
Static typing of dunder ops methods (like DataArray.__eq__()) has been fixed. Remaining issues are upstream problems (GH7780, PR8204). By Michael Niklas.
Fix type annotation for center argument of plotting methods (like xarray.plot.dataarray_plot.pcolormesh()) (PR8261). By Pieter Eendebak.

Documentation#

Make documentation of DataArray.where() clearer (GH7767, PR7955). By Riulinchen.

Internal Changes#

Many error messages related to invalid dimensions or coordinates now always show the list of valid dims/coords (PR8079). By András Gunyhó.
Refactor of encoding and decoding times/timedeltas to preserve nanosecond resolution in arrays that contain missing values (PR7827). By Kai Mühlbauer.
Transition .rolling_exp functions to use .apply_ufunc internally rather than .reduce, as the start of a broader effort to move non-reducing functions away from `.reduce, (PR8114). By Maximilian Roos.
Test range of fill_value’s in test_interpolate_pd_compat (GH8146, PR8189). By Kai Mühlbauer.

v2023.08.0 (Aug 18, 2023)#

This release brings changes to minimum dependencies, allows reading of datasets where a dimension name is associated with a multidimensional variable (e.g. finite volume ocean model output), and introduces a new xarray.Coordinates object.

Thanks to the 16 contributors to this release: Anderson Banihirwe, Articoking, Benoit Bovy, Deepak Cherian, Harshitha, Ian Carroll, Joe Hamman, Justus Magin, Peter Hill, Rachel Wegener, Riley Kuttruff, Thomas Nicholas, Tom Nicholas, ilgast, quantsnus, vallirep

Announcements#

The xarray.Variable class is being refactored out to a new project title ‘namedarray’. See the design doc for more details. Reach out to us on this [discussion topic](pydata/xarray#8080) if you have any thoughts.

New Features#

Coordinates can now be constructed independently of any Dataset or DataArray (it is also returned by the Dataset.coords and DataArray.coords properties). Coordinates objects are useful for passing both coordinate variables and indexes to new Dataset / DataArray objects, e.g., via their constructor or via Dataset.assign_coords(). We may also wrap coordinate variables in a Coordinates object in order to skip the automatic creation of (pandas) indexes for dimension coordinates. The Coordinates.from_pandas_multiindex constructor may be used to create coordinates directly from a pandas.MultiIndex object (it is preferred over passing it directly as coordinate data, which may be deprecated soon). Like Dataset and DataArray objects, Coordinates objects may now be used in align() and merge(). (GH6392, PR7368). By Benoît Bovy.
Visually group together coordinates with the same indexes in the index section of the text repr (PR7225). By Justus Magin.
Allow creating Xarray objects where a multidimensional variable shares its name with a dimension. Examples include output from finite volume models like FVCOM. (GH2233, PR7989) By Deepak Cherian and Benoit Bovy.
When outputting Dataset objects as Zarr via Dataset.to_zarr(), user can now specify that chunks that will contain no valid data will not be written. Originally, this could be done by specifying "write_empty_chunks": True in the encoding parameter; however, this setting would not carry over when appending new data to an existing dataset. (GH8009) Requires zarr>=2.11.

Breaking changes#

The minimum versions of some dependencies were changed (PR8022):

Package	Old	New
boto3	1.20	1.24
cftime	1.5	1.6
dask-core	2022.1	2022.7
distributed	2022.1	2022.7
hfnetcdf	0.13	1.0
iris	3.1	3.2
lxml	4.7	4.9
netcdf4	1.5.7	1.6.0
numpy	1.21	1.22
pint	0.18	0.19
pydap	3.2	3.3
rasterio	1.2	1.3
scipy	1.7	1.8
toolz	0.11	0.12
typing_extensions	4.0	4.3
zarr	2.10	2.12
numbagg	0.1	0.2.1

Documentation#

Added page on the internal design of xarray objects. (PR7991) By Tom Nicholas.
Added examples to docstrings of Dataset.assign_attrs(), Dataset.broadcast_equals(), Dataset.equals(), Dataset.identical(), Dataset.expand_dims(), Dataset.drop_vars() (GH6793, PR7937) By Harshitha.
Add docstrings for the Index base class and add some documentation on how to create custom, Xarray-compatible indexes (PR6975) By Benoît Bovy.
Added a page clarifying the role of Xarray core team members. (PR7999) By Tom Nicholas.
Fixed broken links in “See also” section of Dataset.count() (GH8055, PR8057) By Articoking.
Extended the glossary by adding terms Aligning, Broadcasting, Merging, Concatenating, Combining, lazy, labeled, serialization, indexing (GH3355, PR7732) By Harshitha.

Internal Changes#

as_variable() now consistently includes the variable name in any exceptions raised. (PR7995). By Peter Hill
encode_dataset_coordinates() now sorts coordinates automatically assigned to coordinates attributes during serialization (GH8026, PR8034). By Ian Carroll.

v2023.07.0 (July 17, 2023)#

This release brings improvements to the documentation on wrapping numpy-like arrays, improved docstrings, and bug fixes.

Deprecations#

hue_style is being deprecated for scatter plots. (GH7907, PR7925). By Jimmy Westling.

Bug fixes#

Ensure no forward slashes in variable and dimension names for HDF5-based engines. (GH7943, PR7953) By Kai Mühlbauer.

Documentation#

Added page on wrapping chunked numpy-like arrays as alternatives to dask arrays. (PR7951) By Tom Nicholas.
Expanded the page on wrapping numpy-like “duck” arrays. (PR7911) By Tom Nicholas.
Added examples to docstrings of Dataset.isel(), Dataset.reduce(), Dataset.argmin(), Dataset.argmax() (GH6793, PR7881) By Harshitha .

Internal Changes#

Allow chunked non-dask arrays (i.e. Cubed arrays) in groupby operations. (PR7941) By Tom Nicholas.

v2023.06.0 (June 21, 2023)#

This release adds features to curvefit, improves the performance of concatenation, and fixes various bugs.

Thank to our 13 contributors to this release: Anderson Banihirwe, Deepak Cherian, dependabot[bot], Illviljan, Juniper Tyree, Justus Magin, Martin Fleischmann, Mattia Almansi, mgunyho, Rutger van Haasteren, Thomas Nicholas, Tom Nicholas, Tom White.

New Features#

Added support for multidimensional initial guess and bounds in DataArray.curvefit() (GH7768, PR7821). By András Gunyhó.
Add an errors option to Dataset.curve_fit() that allows returning NaN for the parameters and covariances of failed fits, rather than failing the whole series of fits (GH6317, PR7891). By Dominik Stańczak and András Gunyhó.

Breaking changes#

Deprecations#

Deprecate the cdms2 conversion methods (PR7876) By Justus Magin.

Performance#

Improve concatenation performance (GH7833, PR7824). By Jimmy Westling.

Bug fixes#

Fix bug where weighted polyfit were changing the original object (GH5644, PR7900). By Mattia Almansi.
Don’t call CachingFileManager.__del__ on interpreter shutdown (GH7814, PR7880). By Justus Magin.
Preserve vlen dtype for empty string arrays (GH7328, PR7862). By Tom White and Kai Mühlbauer.
Ensure dtype of reindex result matches dtype of the original DataArray (GH7299, PR7917) By Anderson Banihirwe.
Fix bug where a zero-length zarr chunk_store was ignored as if it was None (PR7923) By Juniper Tyree.

Documentation#

Internal Changes#

Minor improvements to support of the python array api standard, internally using the function xp.astype() instead of the method arr.astype(), as the latter is not in the standard. (PR7847) By Tom Nicholas.
Xarray now uploads nightly wheels to https://pypi.anaconda.org/scientific-python-nightly-wheels/simple/ (GH7863, PR7865). By Martin Fleischmann.
Stop uploading development wheels to TestPyPI (PR7889) By Justus Magin.
Added an exception catch for AttributeError along with ImportError when duck typing the dynamic imports in pycompat.py. This catches some name collisions between packages. (GH7870, PR7874)

v2023.05.0 (May 18, 2023)#

This release adds some new methods and operators, updates our deprecation policy for python versions, fixes some bugs with groupby, and introduces experimental support for alternative chunked parallel array computation backends via a new plugin system!

Note: If you are using a locally-installed development version of xarray then pulling the changes from this release may require you to re-install. This avoids an error where xarray cannot detect dask via the new entrypoints system introduced in PR7019. See GH7856 for details.

Thanks to our 14 contributors: Alan Brammer, crusaderky, David Stansby, dcherian, Deeksha, Deepak Cherian, Illviljan, James McCreight, Joe Hamman, Justus Magin, Kyle Sunden, Max Hollmann, mgunyho, and Tom Nicholas

New Features#

Added new method DataArray.to_dask_dataframe(), convert a dataarray into a dask dataframe (GH7409). By Deeksha.
Add support for lshift and rshift binary operators (<<, >>) on xr.DataArray of type int (GH7727 , PR7741). By Alan Brammer.
Keyword argument data='array' to both xarray.Dataset.to_dict() and xarray.DataArray.to_dict() will now return data as the underlying array type. Python lists are returned for data='list' or data=True. Supplying data=False only returns the schema without data. encoding=True returns the encoding dictionary for the underlying variable also. (GH1599, PR7739) . By James McCreight.

Breaking changes#

adjust the deprecation policy for python to once again align with NEP-29 (GH7765, PR7793) By Justus Magin.

Performance#

Optimize .dt `` accessor performance with ``CFTimeIndex. (PR7796) By Deepak Cherian.

Bug fixes#

Fix as_compatible_data for masked float arrays, now always creates a copy when mask is present (GH2377, PR7788). By Max Hollmann.
Fix groupby binary ops when grouped array is subset relative to other. (GH7797). By Deepak Cherian.
Fix groupby sum, prod for all-NaN groups with flox. (GH7808). By Deepak Cherian.

Internal Changes#

Experimental support for wrapping chunked array libraries other than dask. A new ABC is defined - xr.namedarray.parallelcompat.ChunkManagerEntrypoint - which can be subclassed and then registered by alternative chunked array implementations. (GH6807, PR7019) By Tom Nicholas.

v2023.04.2 (April 20, 2023)#

This is a patch release to fix a bug with binning (GH7766)

Bug fixes#

Fix binning when labels is specified. (GH7766). By Deepak Cherian.

Documentation#

Added examples to docstrings for xarray.core.accessor_str.StringAccessor() methods. (PR7669) . By Mary Gathoni.

v2023.04.1 (April 18, 2023)#

This is a patch release to fix a bug with binning (GH7759)

Bug fixes#

Fix binning by unsorted arrays. (GH7759)

v2023.04.0 (April 14, 2023)#

This release includes support for pandas v2, allows refreshing of backend engines in a session, and removes deprecated backends for rasterio and cfgrib.

Thanks to our 19 contributors: Chinemere, Tom Coleman, Deepak Cherian, Harshitha, Illviljan, Jessica Scheick, Joe Hamman, Justus Magin, Kai Mühlbauer, Kwonil-Kim, Mary Gathoni, Michael Niklas, Pierre, Scott Henderson, Shreyal Gupta, Spencer Clark, mccloskey, nishtha981, veenstrajelmer

We welcome the following new contributors to Xarray!: Mary Gathoni, Harshitha, veenstrajelmer, Chinemere, nishtha981, Shreyal Gupta, Kwonil-Kim, mccloskey.

New Features#

New methods to reset an objects encoding (Dataset.reset_encoding(), DataArray.reset_encoding()). (GH7686, PR7689). By Joe Hamman.
Allow refreshing backend engines with xarray.backends.refresh_engines() (GH7478, PR7523). By Michael Niklas.
Added ability to save DataArray objects directly to Zarr using to_zarr(). (GH7692, PR7693) . By Joe Hamman.

Breaking changes#

Remove deprecated rasterio backend in favor of rioxarray (PR7392). By Scott Henderson.

Deprecations#

Performance#

Optimize alignment with join="exact", copy=False by avoiding copies. (PR7736) By Deepak Cherian.
Avoid unnecessary copies of CFTimeIndex. (PR7735) By Deepak Cherian.

Bug fixes#

Fix xr.polyval() with non-system standard integer coeffs (PR7619). By Shreyal Gupta and Michael Niklas.
Improve error message when trying to open a file which you do not have permission to read (GH6523, PR7629). By Thomas Coleman.
Proper plotting when passing BoundaryNorm type argument in DataArray.plot(). (GH4061, GH7014,:pull:7553) By Jelmer Veenstra.
Ensure the formatting of time encoding reference dates outside the range of nanosecond-precision datetimes remains the same under pandas version 2.0.0 (GH7420, PR7441). By Justus Magin and Spencer Clark.
Various dtype related fixes needed to support pandas>=2.0 (PR7724) By Justus Magin.
Preserve boolean dtype within encoding (GH7652, PR7720). By Kai Mühlbauer

Documentation#

Update FAQ page on how do I open format X file as an xarray dataset? (GH1285, PR7638) using open_dataset() By Harshitha , Tom Nicholas.

Internal Changes#

Don’t assume that arrays read from disk will be Numpy arrays. This is a step toward enabling reads from a Zarr store using the Kvikio or TensorStore libraries. (PR6874). By Deepak Cherian.
Remove internal support for reading GRIB files through the cfgrib backend. cfgrib now uses the external backend interface, so no existing code should break. By Deepak Cherian.
Implement CF coding functions in VariableCoders (PR7719). By Kai Mühlbauer
Added a config.yml file with messages for the welcome bot when a Github user creates their first ever issue or pull request or has their first PR merged. (GH7685, PR7685) By Nishtha P.
Ensure that only nanosecond-precision pd.Timestamp objects continue to be used internally under pandas version 2.0.0. This is mainly to ease the transition to this latest version of pandas. It should be relaxed when addressing GH7493. By Spencer Clark (GH7707, PR7731).

v2023.03.0 (March 22, 2023)#

This release brings many bug fixes, and some new features. The maximum pandas version is pinned to <2 until we can support the new pandas datetime types. Thanks to our 19 contributors: Abel Aoun, Alex Goodman, Deepak Cherian, Illviljan, Jody Klymak, Joe Hamman, Justus Magin, Mary Gathoni, Mathias Hauser, Mattia Almansi, Mick, Oriol Abril-Pla, Patrick Hoefler, Paul Ockenfuß, Pierre, Shreyal Gupta, Spencer Clark, Tom Nicholas, Tom Vo

New Features#

Fix xr.cov() and xr.corr() now support complex valued arrays (GH7340, PR7392). By Michael Niklas.
Allow indexing along unindexed dimensions with dask arrays (GH2511, GH4276, GH4663, PR5873). By Abel Aoun and Deepak Cherian.
Support dask arrays in first and last reductions. By Deepak Cherian.
Improved performance in open_dataset for datasets with large object arrays (GH7484, PR7494). By Alex Goodman and Deepak Cherian.

Breaking changes#

Deprecations#

Following pandas, the base and loffset parameters of xr.DataArray.resample() and xr.Dataset.resample() have been deprecated and will be removed in a future version of xarray. Using the origin or offset parameters is recommended as a replacement for using the base parameter and using time offset arithmetic is recommended as a replacement for using the loffset parameter (PR8459). By Spencer Clark.

Bug fixes#

Improve error message when using in Dataset.drop_vars() to state which variables can’t be dropped. (PR7518) By Tom Nicholas.
Require to explicitly defining optional dimensions such as hue and markersize for scatter plots. (GH7314, PR7277). By Jimmy Westling.
Fix matplotlib raising a UserWarning when plotting a scatter plot with an unfilled marker (GH7313, PR7318). By Jimmy Westling.
Fix issue with max_gap in interpolate_na, when applied to multidimensional arrays. (GH7597, PR7598). By Paul Ockenfuß.
Fix DataArray.plot.pcolormesh() which now works if one of the coordinates has str dtype (GH6775, PR7612). By Michael Niklas.

Documentation#

Clarify language in contributor’s guide (GH7495, PR7595) By Tom Nicholas.

Internal Changes#

Pin pandas to <2. By Deepak Cherian.

v2023.02.0 (Feb 7, 2023)#

This release brings a major upgrade to xarray.concat(), many bug fixes, and a bump in supported dependency versions. Thanks to our 11 contributors: Aron Gergely, Deepak Cherian, Illviljan, James Bourbeau, Joe Hamman, Justus Magin, Hauke Schulz, Kai Mühlbauer, Ken Mankoff, Spencer Clark, Tom Nicholas.

Breaking changes#

Support for python 3.8 has been dropped and the minimum versions of some dependencies were changed (PR7461):

Package

Old

New

python

3.8

3.9

numpy

1.20

1.21

pandas

1.3

1.4

dask

2021.11

2022.1

distributed

2021.11

2022.1

h5netcdf

0.11

0.13

lxml

4.6

4.7

numba

5.4

5.5

Deprecations#

Following pandas, the closed parameters of cftime_range() and date_range() are deprecated in favor of the inclusive parameters, and will be removed in a future version of xarray (GH6985:, PR7373). By Spencer Clark.

Bug fixes#

xarray.concat() can now concatenate variables present in some datasets but not others (GH508, PR7400). By Kai Mühlbauer and Scott Chamberlin.
Handle keep_attrs option in binary operators of Dataset() (GH7390, PR7391). By Aron Gergely.
Improve error message when using dask in apply_ufunc() with output_sizes not supplied. (PR7509) By Tom Nicholas.
xarray.Dataset.to_zarr() now drops variable encodings that have been added by xarray during reading a dataset. (GH7129, PR7500). By Hauke Schulz.

Documentation#

Mention the flox package in GroupBy documentation and docstrings. By Deepak Cherian.

v2023.01.0 (Jan 17, 2023)#

This release includes a number of bug fixes. Thanks to the 14 contributors to this release: Aron Gergely, Benoit Bovy, Deepak Cherian, Ian Carroll, Illviljan, Joe Hamman, Justus Magin, Mark Harfouche, Matthew Roeschke, Paige Martin, Pierre, Sam Levang, Tom White, stefank0.

Breaking changes#

CFTimeIndex.get_loc() has removed the method and tolerance keyword arguments. Use .get_indexer([key], method=..., tolerance=...) instead (PR7361). By Matthew Roeschke.

Bug fixes#

Avoid in-memory broadcasting when converting to a dask dataframe using .to_dask_dataframe. (GH6811, PR7472). By Jimmy Westling.
Accessing the property .nbytes of a DataArray, or Variable no longer accidentally triggers loading the variable into memory.
Allow numpy-only objects in where() when keep_attrs=True (GH7362, PR7364). By Sam Levang.
add a keep_attrs parameter to Dataset.pad(), DataArray.pad(), and Variable.pad() (PR7267). By Justus Magin.
Fixed performance regression in alignment between indexed and non-indexed objects of the same shape (PR7382). By Benoît Bovy.
Preserve original dtype on accessing MultiIndex levels (GH7250, PR7393). By Ian Carroll.

Internal Changes#

Add the pre-commit hook absolufy-imports to convert relative xarray imports to absolute imports (PR7204, PR7370). By Jimmy Westling.

v2022.12.0 (2022 Dec 2)#

This release includes a number of bug fixes and experimental support for Zarr V3. Thanks to the 16 contributors to this release: Deepak Cherian, Francesco Zanetta, Gregory Lee, Illviljan, Joe Hamman, Justus Magin, Luke Conibear, Mark Harfouche, Mathias Hauser, Mick, Mike Taves, Sam Levang, Spencer Clark, Tom Nicholas, Wei Ji, templiert

New Features#

Enable using offset and origin arguments in DataArray.resample() and Dataset.resample() (GH7266, PR7284). By Spencer Clark.
Add experimental support for Zarr’s in-progress V3 specification. (PR6475). By Gregory Lee and Joe Hamman.

Breaking changes#

The minimum versions of some dependencies were changed (PR7300):

Package	Old	New
boto	1.18	1.20
cartopy	0.19	0.20
distributed	2021.09	2021.11
dask	2021.09	2021.11
h5py	3.1	3.6
hdf5	1.10	1.12
matplotlib-base	3.4	3.5
nc-time-axis	1.3	1.4
netcdf4	1.5.3	1.5.7
packaging	20.3	21.3
pint	0.17	0.18
pseudonetcdf	3.1	3.2
typing_extensions	3.10	4.0

Deprecations#

The PyNIO backend has been deprecated (GH4491, PR7301). By Joe Hamman.

Bug fixes#

Fix handling of coordinate attributes in where(). (GH7220, PR7229) By Sam Levang.
Import nc_time_axis when needed (GH7275, PR7276). By Michael Niklas.
Fix static typing of xr.polyval() (GH7312, PR7315). By Michael Niklas.
Fix multiple reads on fsspec S3 files by resetting file pointer to 0 when reading file streams (GH6813, PR7304). By David Hoese and Wei Ji Leong.
Fix Dataset.assign_coords() resetting all dimension coordinates to default (pandas) index (GH7346, PR7347). By Benoît Bovy.

Documentation#

Add example of reading and writing individual groups to a single netCDF file to I/O docs page. (PR7338) By Tom Nicholas.

Internal Changes#

v2022.11.0 (Nov 4, 2022)#

This release brings a number of bugfixes and documentation improvements. Both text and HTML reprs now have a new “Indexes” section, which we expect will help with development of new Index objects. This release also features more support for the Python Array API.

Many thanks to the 16 contributors to this release: Daniel Goman, Deepak Cherian, Illviljan, Jessica Scheick, Justus Magin, Mark Harfouche, Maximilian Roos, Mick, Patrick Naylor, Pierre, Spencer Clark, Stephan Hoyer, Tom Nicholas, Tom White

New Features#

Add static typing to plot accessors (GH6949, PR7052). By Michael Niklas.
Display the indexes in a new section of the text and HTML reprs (PR6795, PR7183, PR7185) By Justus Magin and Benoît Bovy.
Added methods DataArrayGroupBy.cumprod() and DatasetGroupBy.cumprod(). (PR5816) By Patrick Naylor

Breaking changes#

repr(ds) may not show the same result because it doesn’t load small, lazy data anymore. Use ds.head().load() when wanting to see just a sample of the data. (GH6722, PR7203). By Jimmy Westling.
Many arguments of plotmethods have been made keyword-only.
xarray.plot.plot module renamed to xarray.plot.dataarray_plot to prevent shadowing of the plot method. (GH6949, PR7052). By Michael Niklas.

Deprecations#

Positional arguments for all plot methods have been deprecated (GH6949, PR7052). By Michael Niklas.
xarray.plot.FacetGrid.axes has been renamed to xarray.plot.FacetGrid.axs because it’s not clear if axes refers to single or multiple Axes instances. This aligns with matplotlib.pyplot.subplots. (PR7194) By Jimmy Westling.

Bug fixes#

Explicitly opening a file multiple times (e.g., after modifying it on disk) now reopens the file from scratch for h5netcdf and scipy netCDF backends, rather than reusing a cached version (GH4240, GH4862). By Stephan Hoyer.
Fixed bug where Dataset.coarsen.construct() would demote non-dimension coordinates to variables. (PR7233) By Tom Nicholas.
Raise a TypeError when trying to plot empty data (GH7156, PR7228). By Michael Niklas.

Documentation#

Improves overall documentation around available backends, including adding docstrings for xarray.backends.list_engines() Add __str__() to surface the new BackendEntrypoint description and url attributes. (GH6577, PR7000) By Jessica Scheick.
Created docstring examples for DataArray.cumsum(), DataArray.cumprod(), Dataset.cumsum(), Dataset.cumprod(), DatasetGroupBy.cumsum(), DataArrayGroupBy.cumsum(). (GH5816, PR7152) By Patrick Naylor
Add example of using DataArray.coarsen.construct() to User Guide. (PR7192) By Tom Nicholas.
Rename axes to axs in plotting to align with matplotlib.pyplot.subplots. (PR7194) By Jimmy Westling.
Add documentation of specific BackendEntrypoints (PR7200). By Michael Niklas.
Add examples to docstring for DataArray.drop_vars(), DataArray.reindex_like(), DataArray.interp_like(). (GH6793, PR7123) By Daniel Goman.

Internal Changes#

Doctests fail on any warnings (PR7166) By Maximilian Roos.
Improve import time by lazy loading dask.distributed (PR7172).
Explicitly specify longdouble=False in cftime.date2num() when encoding times to preserve existing behavior and prevent future errors when it is eventually set to True by default in cftime (PR7171). By Spencer Clark.
Improved import time by lazily importing backend modules, matplotlib, dask.array and flox. (GH6726, PR7179) By Michael Niklas.
Emit a warning under the development version of pandas when we convert non-nanosecond precision datetime or timedelta values to nanosecond precision. This was required in the past, because pandas previously was not compatible with non-nanosecond precision values. However pandas is currently working towards removing this restriction. When things stabilize in pandas we will likely consider relaxing this behavior in xarray as well (GH7175, PR7201). By Spencer Clark.

v2022.10.0 (Oct 14 2022)#

This release brings numerous bugfixes, a change in minimum supported versions, and a new scatter plot method for DataArrays.

Many thanks to 11 contributors to this release: Anderson Banihirwe, Benoit Bovy, Dan Adriaansen, Illviljan, Justus Magin, Lukas Bindreiter, Mick, Patrick Naylor, Spencer Clark, Thomas Nicholas

New Features#

Add scatter plot for datarrays. Scatter plots now also supports 3d plots with the z argument. (PR6778) By Jimmy Westling.
Include the variable name in the error message when CF decoding fails to allow for easier identification of problematic variables (GH7145, PR7147). By Spencer Clark.

Breaking changes#

The minimum versions of some dependencies were changed:

Package	Old	New
cftime	1.4	1.5
distributed	2021.08	2021.09
dask	2021.08	2021.09
iris	2.4	3.1
nc-time-axis	1.2	1.3
numba	0.53	0.54
numpy	1.19	1.20
pandas	1.2	1.3
packaging	20.0	21.0
scipy	1.6	1.7
sparse	0.12	0.13
typing_extensions	3.7	3.10
zarr	2.8	2.10

Bug fixes#

Remove nested function from open_mfdataset() to allow Dataset objects to be pickled. (GH7109, PR7116) By Daniel Adriaansen.
Support for recursively defined Arrays. Fixes repr and deepcopy. (GH7111, PR7112) By Michael Niklas.
Fixed Dataset.transpose() to raise a more informative error. (GH6502, PR7120) By Patrick Naylor
Fix groupby on a multi-index level coordinate and fix DataArray.to_index() for multi-index levels (convert to single index). (GH6836, PR7105) By Benoît Bovy.
Support for open_dataset backends that return datasets containing multi-indexes (GH7139, PR7150) By Lukas Bindreiter.

v2022.09.0 (September 30, 2022)#

This release brings a large number of bugfixes and documentation improvements, as well as an external interface for setting custom indexes!

Many thanks to our 40 contributors:

Anderson Banihirwe, Andrew Ronald Friedman, Bane Sullivan, Benoit Bovy, ColemanTom, Deepak Cherian, Dimitri Papadopoulos Orfanos, Emma Marshall, Fabian Hofmann, Francesco Nattino, ghislainp, Graham Inggs, Hauke Schulz, Illviljan, James Bourbeau, Jody Klymak, Julia Signell, Justus Magin, Keewis, Ken Mankoff, Luke Conibear, Mathias Hauser, Max Jones, mgunyho, Michael Delgado, Mick, Mike Taves, Oliver Lopez, Patrick Naylor, Paul Hockett, Pierre Manchon, Ray Bell, Riley Brady, Sam Levang, Spencer Clark, Stefaan Lippens, Tom Nicholas, Tom White, Travis A. O’Brien, and Zachary Moon.

New Features#

Add Dataset.set_xindex() and Dataset.drop_indexes() and their DataArray counterpart for setting and dropping pandas or custom indexes given a set of arbitrary coordinates. (PR6971) By Benoît Bovy and Justus Magin.
Enable taking the mean of dask-backed cftime.datetime arrays (PR6556, PR6940). By Deepak Cherian and Spencer Clark.

Bug fixes#

Allow reading netcdf files where the ‘units’ attribute is a number. (PR7085) By Ghislain Picard.
Allow decoding of 0 sized datetimes. (GH1329, PR6882) By Deepak Cherian.
Make sure DataArray.name is always a string when used as label for plotting. (GH6826, PR6832) By Jimmy Westling.
DataArray.nbytes now uses the nbytes property of the underlying array if available. (PR6797) By Max Jones.
Rely on the array backend for string formatting. (PR6823). By Jimmy Westling.
Fix incompatibility with numpy 1.20. (GH6818, PR6821) By Michael Niklas.
Fix side effects on index coordinate metadata after aligning objects. (GH6852, PR6857) By Benoît Bovy.
Make FacetGrid.set_titles send kwargs correctly using handle.update(kwargs). (GH6839, PR6843) By Oliver Lopez.
Fix bug where index variables would be changed inplace. (GH6931, PR6938) By Michael Niklas.
Allow taking the mean over non-time dimensions of datasets containing dask-backed cftime arrays. (GH5897, PR6950) By Spencer Clark.
Harmonize returned multi-indexed indexes when applying concat along new dimension. (GH6881, PR6889) By Fabian Hofmann.
Fix step plots with hue arg. (PR6944) By András Gunyhó.
Avoid use of random numbers in test_weighted.test_weighted_operations_nonequal_coords. (GH6504, PR6961) By Luke Conibear.
Fix multiple regression issues with Dataset.set_index() and Dataset.reset_index(). (PR6992) By Benoît Bovy.
Raise a UserWarning when renaming a coordinate or a dimension creates a non-indexed dimension coordinate, and suggest the user creating an index either with swap_dims or set_index. (GH6607, PR6999) By Benoît Bovy.
Use keep_attrs=True in grouping and resampling operations by default. (GH7012) This means Dataset.attrs and DataArray.attrs are now preserved by default. By Deepak Cherian.
Dataset.encoding['source'] now exists when reading from a Path object. (GH5888, PR6974) By Thomas Coleman.
Better dtype consistency for rolling.mean(). (GH7062, PR7063) By Sam Levang.
Allow writing NetCDF files including only dimensionless variables using the distributed or multiprocessing scheduler. (GH7013, PR7040) By Francesco Nattino.
Fix deepcopy of attrs and encoding of DataArrays and Variables. (GH2835, PR7089) By Michael Niklas.
Fix bug where subplot_kwargs were not working when plotting with figsize, size or aspect. (GH7078, PR7080) By Michael Niklas.

Documentation#

Update merge docstrings. (GH6935, PR7033) By Zach Moon.
Raise a more informative error when trying to open a non-existent zarr store. (GH6484, PR7060) By Sam Levang.
Added examples to docstrings for DataArray.expand_dims(), DataArray.drop_duplicates(), DataArray.reset_coords(), DataArray.equals(), DataArray.identical(), DataArray.broadcast_equals(), DataArray.bfill(), DataArray.ffill(), DataArray.fillna(), DataArray.dropna(), DataArray.drop_isel(), DataArray.drop_sel(), DataArray.head(), DataArray.tail(). (GH5816, PR7088) By Patrick Naylor.
Add missing docstrings to various array properties. (PR7090) By Tom Nicholas.

Internal Changes#

Added test for DataArray attrs deepcopy recursion/nested attrs. (GH2835, PR7086) By Paul hockett.

v2022.06.0 (July 21, 2022)#

This release brings a number of bug fixes and improvements, most notably a major internal refactor of the indexing functionality, the use of flox in groupby operations, and experimental support for the new Python Array API standard. It also stops testing support for the abandoned PyNIO.

Much effort has been made to preserve backwards compatibility as part of the indexing refactor. We are aware of one unfixed issue.

Please also see the whats-new.2022.06.0rc0 for a full list of changes.

Many thanks to our 18 contributors: Bane Sullivan, Deepak Cherian, Dimitri Papadopoulos Orfanos, Emma Marshall, Hauke Schulz, Illviljan, Julia Signell, Justus Magin, Keewis, Mathias Hauser, Michael Delgado, Mick, Pierre Manchon, Ray Bell, Spencer Clark, Stefaan Lippens, Tom White, Travis A. O’Brien,

New Features#

Add Dataset.dtypes, core.coordinates.DatasetCoordinates.dtypes, core.coordinates.DataArrayCoordinates.dtypes properties: Mapping from variable names to dtypes. (PR6706) By Michael Niklas.
Initial typing support for groupby(), rolling(), rolling_exp(), coarsen(), weighted(), resample(), (PR6702) By Michael Niklas.
Experimental support for wrapping any array type that conforms to the python array api standard. (PR6804) By Tom White.
Allow string formatting of scalar DataArrays. (PR5981) By fmaussion.

Bug fixes#

save_mfdataset() now passes **kwargs on to Dataset.to_netcdf(), allowing the encoding and unlimited_dims options with save_mfdataset(). (GH6684) By Travis A. O’Brien.
Fix backend support of pydap versions <3.3.0 (GH6648, PR6656). By Hauke Schulz.
Dataset.where() with drop=True now behaves correctly with mixed dimensions. (GH6227, PR6690) By Michael Niklas.
Accommodate newly raised OutOfBoundsTimedelta error in the development version of pandas when decoding times outside the range that can be represented with nanosecond-precision values (GH6716, PR6717). By Spencer Clark.
open_dataset() with dask and ~ in the path now resolves the home directory instead of raising an error. (GH6707, PR6710) By Michael Niklas.
DataArrayRolling.__iter__() with center=True now works correctly. (GH6739, PR6744) By Michael Niklas.

Internal Changes#

xarray.core.groupby, xarray.core.rolling, xarray.core.rolling_exp, xarray.core.weighted and xarray.core.resample modules are no longer imported by default. (PR6702)

v2022.06.0rc0 (9 June 2022)#

This pre-release brings a number of bug fixes and improvements, most notably a major internal refactor of the indexing functionality and the use of flox in groupby operations. It also stops testing support for the abandoned PyNIO.

Install it using

mamba create -n <name> python=3.10 xarray
python -m pip install --pre --upgrade --no-deps xarray

Many thanks to the 39 contributors:

Abel Soares Siqueira, Alex Santana, Anderson Banihirwe, Benoit Bovy, Blair Bonnett, Brewster Malevich, brynjarmorka, Charles Stern, Christian Jauvin, Deepak Cherian, Emma Marshall, Fabien Maussion, Greg Behm, Guelate Seyo, Illviljan, Joe Hamman, Joseph K Aicher, Justus Magin, Kevin Paul, Louis Stenger, Mathias Hauser, Mattia Almansi, Maximilian Roos, Michael Bauer, Michael Delgado, Mick, ngam, Oleh Khoma, Oriol Abril-Pla, Philippe Blain, PLSeuJ, Sam Levang, Spencer Clark, Stan West, Thomas Nicholas, Thomas Vogt, Tom White, Xianxiang Li

Known Regressions#

reset_coords(drop=True) does not create indexes (GH6607)

New Features#

The zarr backend is now able to read NCZarr. By Mattia Almansi.
Add a weighted quantile method to computation.weighted.DatasetWeighted and DataArrayWeighted (PR6059). By Christian Jauvin and David Huard.
Add a create_index=True parameter to Dataset.stack() and DataArray.stack() so that the creation of multi-indexes is optional (PR5692). By Benoît Bovy.
Multi-index levels are now accessible through their own, regular coordinates instead of virtual coordinates (PR5692). By Benoît Bovy.
Add a display_values_threshold option to control the total number of array elements which trigger summarization rather than full repr in (numpy) array detailed views of the html repr (PR6400). By Benoît Bovy.
Allow passing chunks in kwargs form to Dataset.chunk(), DataArray.chunk(), and Variable.chunk(). (PR6471) By Tom Nicholas.
Add core.groupby.DatasetGroupBy.cumsum() and core.groupby.DataArrayGroupBy.cumsum(). By Vladislav Skripniuk and Deepak Cherian. (PR3147, PR6525, GH3141)
Expose inline_array kwarg from dask.array.from_array in open_dataset(), Dataset.chunk(), DataArray.chunk(), and Variable.chunk(). (PR6471)
Expose the inline_array kwarg from dask.array.from_array() in open_dataset(), Dataset.chunk(), DataArray.chunk(), and Variable.chunk(). (PR6471) By Tom Nicholas.
polyval() now supports Dataset and DataArray args of any shape, is faster and requires less memory. (PR6548) By Michael Niklas.
Improved overall typing.
Dataset.to_dict() and DataArray.to_dict() may now optionally include encoding attributes. (PR6635) By Joe Hamman.
Upload development versions to TestPyPI. By Justus Magin.

Breaking changes#

PyNIO support is now untested. The minimum versions of some dependencies were changed:

Package	Old	New
cftime	1.2	1.4
dask	2.30	2021.4
distributed	2.30	2021.4
h5netcdf	0.8	0.11
matplotlib-base	3.3	3.4
numba	0.51	0.53
numpy	1.18	1.19
pandas	1.1	1.2
pint	0.16	0.17
rasterio	1.1	1.2
scipy	1.5	1.6
sparse	0.11	0.12
zarr	2.5	2.8

The Dataset and DataArray rename`` methods do not implicitly add or drop indexes. (PR5692). By Benoît Bovy.
Many arguments like keep_attrs, axis, and skipna are now keyword only for all reduction operations like .mean. By Deepak Cherian, Jimmy Westling.
Xarray’s ufuncs have been removed, now that they can be replaced by numpy’s ufuncs in all supported versions of numpy. By Maximilian Roos.
xr.polyval() now uses the coord argument directly instead of its index coordinate. (PR6548) By Michael Niklas.

Bug fixes#

Dataset.to_zarr() now allows to write all attribute types supported by zarr-python. By Mattia Almansi.
Set skipna=None for all quantile methods (e.g. Dataset.quantile()) and ensure it skips missing values for float dtypes (consistent with other methods). This should not change the behavior (PR6303). By Mathias Hauser.
Many bugs fixed by the explicit indexes refactor, mainly related to multi-index (virtual) coordinates. See the corresponding pull-request on GitHub for more details. (PR5692). By Benoît Bovy.
Fixed “unhashable type” error trying to read NetCDF file with variable having its ‘units’ attribute not str (e.g. numpy.ndarray) (GH6368). By Oleh Khoma.
Omit warning about specified dask chunks separating chunks on disk when the underlying array is empty (e.g., because of an empty dimension) (GH6401). By Joseph K Aicher.
Fixed the poor html repr performance on large multi-indexes (PR6400). By Benoît Bovy.
Allow fancy indexing of duck dask arrays along multiple dimensions. (PR6414) By Justus Magin.
In the API for backends, support dimensions that express their preferred chunk sizes as a tuple of integers. (GH6333, PR6334) By Stan West.
Fix bug in where() when passing non-xarray objects with keep_attrs=True. (GH6444, PR6461) By Sam Levang.
Allow passing both other and drop=True arguments to DataArray.where() and Dataset.where() (PR6466, PR6467). By Michael Delgado.
Ensure dtype encoding attributes are not added or modified on variables that contain datetime-like values prior to being passed to xarray.conventions.decode_cf_variable() (GH6453, PR6489). By Spencer Clark.
Dark themes are now properly detected in Furo-themed Sphinx documents (GH6500, PR6501). By Kevin Paul.
Dataset.isel(), DataArray.isel() with drop=True works as intended with scalar DataArray indexers. (GH6554, PR6579) By Michael Niklas.
Fixed silent overflow issue when decoding times encoded with 32-bit and below unsigned integer data types (GH6589, PR6598). By Spencer Clark.
Fixed .chunks loading lazy data (GH6538). By Deepak Cherian.

Documentation#

Revise the documentation for developers on specifying a backend’s preferred chunk sizes. In particular, correct the syntax and replace lists with tuples in the examples. (GH6333, PR6334) By Stan West.
Mention that DataArray.rename() can rename coordinates. (GH5458, PR6665) By Michael Niklas.
Added examples to Dataset.thin() and DataArray.thin() By Emma Marshall.

Performance#

GroupBy binary operations are now vectorized. Previously this involved looping over all groups. (GH5804, PR6160) By Deepak Cherian.
Substantially improved GroupBy operations using flox. This is auto-enabled when flox is installed. Use xr.set_options(use_flox=False) to use the old algorithm. (GH4473, GH4498, GH659, GH2237, PR271). By Deepak Cherian, Anderson Banihirwe, Jimmy Westling.

Internal Changes#

Many internal changes due to the explicit indexes refactor. See the corresponding pull-request on GitHub for more details. (PR5692). By Benoît Bovy.

v2022.03.0 (2 March 2022)#

This release brings a number of small improvements, as well as a move to calendar versioning (GH6176).

Many thanks to the 16 contributors to the v2022.02.0 release!

Aaron Spring, Alan D. Snow, Anderson Banihirwe, crusaderky, Illviljan, Joe Hamman, Jonas Gliß, Lukas Pilz, Martin Bergemann, Mathias Hauser, Maximilian Roos, Romain Caneill, Stan West, Stijn Van Hoey, Tobias Kölling, and Tom Nicholas.

New Features#

Enabled multiplying tick offsets by floats. Allows float n in CFTimeIndex.shift() if shift_freq is between Day and Microsecond. (GH6134, PR6135). By Aaron Spring.
Enable providing more keyword arguments to the pydap backend when reading OpenDAP datasets (GH6274). By Jonas Gliß.
Allow DataArray.drop_duplicates() to drop duplicates along multiple dimensions at once, and add Dataset.drop_duplicates(). (PR6307) By Tom Nicholas.

Breaking changes#

Renamed the interpolation keyword of all quantile methods (e.g. DataArray.quantile()) to method for consistency with numpy v1.22.0 (PR6108). By Mathias Hauser.

Deprecations#

Bug fixes#

Variables which are chunked using dask in larger (but aligned) chunks than the target zarr chunk size can now be stored using to_zarr() (PR6258) By Tobias Kölling.
Multi-file datasets containing encoded cftime.datetime objects can be read in parallel again (GH6226, PR6249, PR6305). By Martin Bergemann and Stan West.

Documentation#

Delete files of datasets saved to disk while building the documentation and enable building on Windows via sphinx-build (PR6237). By Stan West.

Internal Changes#

v0.21.1 (31 January 2022)#

This is a bugfix release to resolve (GH6216, PR6207).

Bug fixes#

Add packaging as a dependency to Xarray (GH6216, PR6207). By Sebastian Weigand and Joe Hamman.

v0.21.0 (27 January 2022)#

Many thanks to the 20 contributors to the v0.21.0 release!

Abel Aoun, Anderson Banihirwe, Ant Gib, Chris Roat, Cindy Chiao, Deepak Cherian, Dominik Stańczak, Fabian Hofmann, Illviljan, Jody Klymak, Joseph K Aicher, Mark Harfouche, Mathias Hauser, Matthew Roeschke, Maximilian Roos, Michael Delgado, Pascal Bourgault, Pierre, Ray Bell, Romain Caneill, Tim Heap, Tom Nicholas, Zeb Nicholls, joseph nowak, keewis.

New Features#

New top-level function cross(). (GH3279, PR5365). By Jimmy Westling.
keep_attrs support for where() (GH4141, GH4682, PR4687). By Justus Magin.
Enable the limit option for dask array in the following methods DataArray.ffill(), DataArray.bfill(), Dataset.ffill() and Dataset.bfill() (GH6112) By Joseph Nowak.

Breaking changes#

Rely on matplotlib’s default datetime converters instead of pandas’ (GH6102, PR6109). By Jimmy Westling.
Improve repr readability when there are a large number of dimensions in datasets or dataarrays by wrapping the text once the maximum display width has been exceeded. (GH5546, PR5662) By Jimmy Westling.

Deprecations#

Removed the lock kwarg from the zarr and pydap backends, completing the deprecation cycle started in GH5256. By Tom Nicholas.
Support for python 3.7 has been dropped. (PR5892) By Jimmy Westling.

Bug fixes#

Preserve chunks when creating a DataArray from another DataArray (PR5984). By Fabian Hofmann.
Properly support DataArray.ffill(), DataArray.bfill(), Dataset.ffill() and Dataset.bfill() along chunked dimensions (GH6112). By Joseph Nowak.
Subclasses of byte and str (e.g. np.str_ and np.bytes_) will now serialise to disk rather than raising a ValueError: unsupported dtype for netCDF4 variable: object as they did previously (PR5264). By Zeb Nicholls.
Fix applying function with non-xarray arguments using xr.map_blocks(). By Cindy Chiao.
No longer raise an error for an all-nan-but-one argument to DataArray.interpolate_na() when using method='nearest' (GH5994, PR6144). By Michael Delgado.
dt.season can now handle NaN and NaT. (PR5876). By Pierre Loicq.
Determination of zarr chunks handles empty lists for encoding chunks or variable chunks that occurs in certain circumstances (PR5526). By Chris Roat.

Internal Changes#

Replace distutils.version with packaging.version (GH6092). By Mathias Hauser.
Removed internal checks for pd.Panel (GH6145). By Matthew Roeschke.
Add pyupgrade pre-commit hook (PR6152). By Maximilian Roos.

v0.20.2 (9 December 2021)#

This is a bugfix release to resolve (GH3391, GH5715). It also includes performance improvements in unstacking to a sparse array and a number of documentation improvements.

Many thanks to the 20 contributors:

Aaron Spring, Alexandre Poux, Deepak Cherian, Enrico Minack, Fabien Maussion, Giacomo Caria, Gijom, Guillaume Maze, Illviljan, Joe Hamman, Joseph Hardin, Kai Mühlbauer, Matt Henderson, Maximilian Roos, Michael Delgado, Robert Gieseke, Sebastian Weigand and Stephan Hoyer.

Breaking changes#

Use complex nan when interpolating complex values out of bounds by default (instead of real nan) (PR6019). By Alexandre Poux.

Performance#

Significantly faster unstacking to a sparse array. PR5577 By Deepak Cherian.

Bug fixes#

xr.map_blocks() and xr.corr() now work when dask is not installed (GH3391, GH5715, PR5731). By Gijom.
Fix plot.line crash for data of shape (1, N) in _title_for_slice on format_item (PR5948). By Sebastian Weigand.
Fix a regression in the removal of duplicate backend entrypoints (GH5944, PR5959) By Kai Mühlbauer.
Fix an issue that datasets from being saved when time variables with units that cftime can parse but pandas can not were present (PR6049). By Tim Heap.

Documentation#

Better examples in docstrings for groupby and resampling reductions (PR5871). By Deepak Cherian, Maximilian Roos, Jimmy Westling .
Add list-like possibility for tolerance parameter in the reindex functions. By Antoine Gibek,

Internal Changes#

Use importlib to replace functionality of pkg_resources in backend plugins tests. (PR5959). By Kai Mühlbauer.

v0.20.1 (5 November 2021)#

This is a bugfix release to fix GH5930.

Bug fixes#

Fix a regression in the detection of the backend entrypoints (GH5930, PR5931) By Justus Magin.

Documentation#

Significant improvements to API reference. By Deepak Cherian.

v0.20.0 (1 November 2021)#

This release brings improved support for pint arrays, methods for weighted standard deviation, variance, and sum of squares, the option to disable the use of the bottleneck library, significantly improved performance of unstack, as well as many bugfixes and internal changes.

Many thanks to the 40 contributors to this release!:

Aaron Spring, Akio Taniguchi, Alan D. Snow, arfy slowy, Benoit Bovy, Christian Jauvin, crusaderky, Deepak Cherian, Giacomo Caria, Illviljan, James Bourbeau, Joe Hamman, Joseph K Aicher, Julien Herzen, Kai Mühlbauer, keewis, lusewell, Martin K. Scherer, Mathias Hauser, Max Grover, Maxime Liquet, Maximilian Roos, Mike Taves, Nathan Lis, pmav99, Pushkar Kopparla, Ray Bell, Rio McMahon, Scott Staniewicz, Spencer Clark, Stefan Bender, Taher Chegini, Thomas Nicholas, Tomas Chor, Tom Augspurger, Victor Negîrneac, Zachary Blackwood, Zachary Moon, and Zeb Nicholls.

New Features#

Add std, var, sum_of_squares to DatasetWeighted and DataArrayWeighted. By Christian Jauvin.
Added a get_options() method to xarray’s root namespace (GH5698, PR5716) By Pushkar Kopparla.
Xarray now does a better job rendering variable names that are long LaTeX sequences when plotting (GH5681, PR5682). By Tomas Chor.
Add an option ("use_bottleneck") to disable the use of bottleneck using set_options() (PR5560) By Justus Magin.
Added **kwargs argument to open_rasterio() to access overviews (GH3269). By Pushkar Kopparla.
Added storage_options argument to to_zarr() (GH5601, PR5615). By Ray Bell, Zachary Blackwood and Nathan Lis.
Added calendar utilities DataArray.convert_calendar(), DataArray.interp_calendar(), date_range(), date_range_like() and DataArray.dt.calendar (GH5155, PR5233). By Pascal Bourgault.
Histogram plots are set with a title displaying the scalar coords if any, similarly to the other plots (GH5791, PR5792). By Maxime Liquet.
Slice plots display the coords units in the same way as x/y/colorbar labels (PR5847). By Victor Negîrneac.
Added a new Dataset.chunksizes, DataArray.chunksizes, and Variable.chunksizes property, which will always return a mapping from dimension names to chunking pattern along that dimension, regardless of whether the object is a Dataset, DataArray, or Variable. (GH5846, PR5900) By Tom Nicholas.

Breaking changes#

The minimum versions of some dependencies were changed:

Package	Old	New
cftime	1.1	1.2
dask	2.15	2.30
distributed	2.15	2.30
lxml	4.5	4.6
matplotlib-base	3.2	3.3
numba	0.49	0.51
numpy	1.17	1.18
pandas	1.0	1.1
pint	0.15	0.16
scipy	1.4	1.5
seaborn	0.10	0.11
sparse	0.8	0.11
toolz	0.10	0.11
zarr	2.4	2.5

The __repr__ of a xarray.Dataset’s coords and data_vars ignore xarray.set_option(display_max_rows=...) and show the full output when called directly as, e.g., ds.data_vars or print(ds.data_vars) (GH5545, PR5580). By Stefan Bender.

Deprecations#

Deprecate open_rasterio() (GH4697, PR5808). By Alan Snow.
Set the default argument for roll_coords to False for DataArray.roll() and Dataset.roll(). (PR5653) By Tom Nicholas.
xarray.open_mfdataset() will now error instead of warn when a value for concat_dim is passed alongside combine='by_coords'. By Tom Nicholas.

Bug fixes#

Fix ZeroDivisionError from saving dask array with empty dimension (GH5741). By Joseph K Aicher.
Fixed performance bug where cftime import attempted within various core operations if cftime not installed (PR5640). By Luke Sewell
Fixed bug when combining named DataArrays using combine_by_coords(). (PR5834). By Tom Nicholas.
When a custom engine was used in open_dataset() the engine wasn’t initialized properly, causing missing argument errors or inconsistent method signatures. (PR5684) By Jimmy Westling.
Numbers are properly formatted in a plot’s title (GH5788, PR5789). By Maxime Liquet.
Faceted plots will no longer raise a pint.UnitStrippedWarning when a pint.Quantity array is plotted, and will correctly display the units of the data in the colorbar (if there is one) (PR5886). By Tom Nicholas.
With backends, check for path-like objects rather than pathlib.Path type, use os.fspath (PR5879). By Mike Taves.
open_mfdataset() now accepts a single pathlib.Path object (GH5881). By Panos Mavrogiorgos.
Improved performance of Dataset.unstack() (PR5906). By Tom Augspurger.

Documentation#

Users are instructed to try use_cftime=True if a TypeError occurs when combining datasets and one of the types involved is a subclass of cftime.datetime (PR5776). By Zeb Nicholls.
A clearer error is now raised if a user attempts to assign a Dataset to a single key of another Dataset. (PR5839) By Tom Nicholas.

Internal Changes#

Explicit indexes refactor: avoid len(index) in map_blocks (PR5670). By Deepak Cherian.
Explicit indexes refactor: decouple xarray.Index` from xarray.Variable (PR5636). By Benoit Bovy.
Fix Mapping argument typing to allow mypy to pass on str keys (PR5690). By Maximilian Roos.
Annotate many of our tests, and fix some of the resulting typing errors. This will also mean our typing annotations are tested as part of CI. (PR5728). By Maximilian Roos.
Improve the performance of reprs for large datasets or dataarrays. (PR5661) By Jimmy Westling.
Use isort’s float_to_top config. (PR5695). By Maximilian Roos.
Remove use of the deprecated kind argument in pandas.Index.get_slice_bound() inside xarray.CFTimeIndex tests (PR5723). By Spencer Clark.
Refactor xarray.core.duck_array_ops to no longer special-case dispatching to dask versions of functions when acting on dask arrays, instead relying numpy and dask’s adherence to NEP-18 to dispatch automatically. (PR5571) By Tom Nicholas.
Add an ASV benchmark CI and improve performance of the benchmarks (PR5796) By Jimmy Westling.
Use importlib to replace functionality of pkg_resources such as version setting and loading of resources. (PR5845). By Martin K. Scherer.

v0.19.0 (23 July 2021)#

This release brings improvements to plotting of categorical data, the ability to specify how attributes are combined in xarray operations, a new high-level unify_chunks() function, as well as various deprecations, bug fixes, and minor improvements.

Many thanks to the 29 contributors to this release!:

Andrew Williams, Augustus, Aureliana Barghini, Benoit Bovy, crusaderky, Deepak Cherian, ellesmith88, Elliott Sales de Andrade, Giacomo Caria, github-actions[bot], Illviljan, Joeperdefloep, joooeey, Julia Kent, Julius Busecke, keewis, Mathias Hauser, Matthias Göbel, Mattia Almansi, Maximilian Roos, Peter Andreas Entschev, Ray Bell, Sander, Santiago Soler, Sebastian, Spencer Clark, Stephan Hoyer, Thomas Hirtz, Thomas Nicholas.

New Features#

Allow passing argument missing_dims to Variable.transpose() and Dataset.transpose() (GH5550, PR5586) By Giacomo Caria.
Allow passing a dictionary as coords to a DataArray (GH5527, reverts PR1539, which had deprecated this due to python’s inconsistent ordering in earlier versions). By Sander van Rijn.
Added Dataset.coarsen.construct(), DataArray.coarsen.construct() (GH5454, PR5475). By Deepak Cherian.
Xarray now uses consolidated metadata by default when writing and reading Zarr stores (GH5251). By Stephan Hoyer.
New top-level function unify_chunks(). By Mattia Almansi.
Allow assigning values to a subset of a dataset using positional or label-based indexing (GH3015, PR5362). By Matthias Göbel.
Attempting to reduce a weighted object over missing dimensions now raises an error (PR5362). By Mattia Almansi.
Add .sum to rolling_exp() and rolling_exp() for exponentially weighted rolling sums. These require numbagg 0.2.1; (PR5178). By Maximilian Roos.
xarray.cov() and xarray.corr() now lazily check for missing values if inputs are dask arrays (GH4804, PR5284). By Andrew Williams.
Attempting to concat list of elements that are not all Dataset or all DataArray now raises an error (GH5051, PR5425). By Thomas Hirtz.
allow passing a function to combine_attrs (PR4896). By Justus Magin.
Allow plotting categorical data (PR5464). By Jimmy Westling.
Allow removal of the coordinate attribute coordinates on variables by setting .attrs['coordinates']= None (GH5510). By Elle Smith.
Added DataArray.to_numpy(), DataArray.as_numpy(), and Dataset.as_numpy(). (PR5568). By Tom Nicholas.
Units in plot labels are now automatically inferred from wrapped pint.Quantity() arrays. (PR5561). By Tom Nicholas.

Breaking changes#

The default mode for Dataset.to_zarr() when region is set has changed to the new mode="r+", which only allows for overriding pre-existing array values. This is a safer default than the prior mode="a", and allows for higher performance writes (PR5252). By Stephan Hoyer.
The main parameter to combine_by_coords() is renamed to data_objects instead of datasets so anyone calling this method using a named parameter will need to update the name accordingly (GH3248, PR4696). By Augustus Ijams.

Deprecations#

Removed the deprecated dim kwarg to DataArray.integrate() (PR5630)
Removed the deprecated keep_attrs kwarg to DataArray.rolling() (PR5630)
Removed the deprecated keep_attrs kwarg to DataArray.coarsen() (PR5630)
Completed deprecation of passing an xarray.DataArray to Variable() - will now raise a TypeError (PR5630)

Bug fixes#

Fix a minor incompatibility between partial datetime string indexing with a CFTimeIndex and upcoming pandas version 1.3.0 (GH5356, PR5359). By Spencer Clark.
Fix 1-level multi-index incorrectly converted to single index (GH5384, PR5385). By Benoit Bovy.
Don’t cast a duck array in a coordinate to numpy.ndarray in DataArray.differentiate() (PR5408) By Justus Magin.
Fix the repr of Variable objects with display_expand_data=True (PR5406) By Justus Magin.
Plotting a pcolormesh with xscale="log" and/or yscale="log" works as expected after improving the way the interval breaks are generated (GH5333). By Santiago Soler
combine_by_coords() can now handle combining a list of unnamed DataArray as input (GH3248, PR4696). By Augustus Ijams.

Internal Changes#

Run CI on the first & last python versions supported only; currently 3.7 & 3.9. (PR5433) By Maximilian Roos.
Publish test results & timings on each PR. (PR5537) By Maximilian Roos.
Explicit indexes refactor: add a xarray.Index.query() method in which one may eventually provide a custom implementation of label-based data selection (not ready yet for public use). Also refactor the internal, pandas-specific implementation into PandasIndex.query() and PandasMultiIndex.query() (PR5322). By Benoit Bovy.

v0.18.2 (19 May 2021)#

This release reverts a regression in xarray’s unstacking of dask-backed arrays.

v0.18.1 (18 May 2021)#

This release is intended as a small patch release to be compatible with the new 2021.5.0 dask.distributed release. It also includes a new drop_duplicates method, some documentation improvements, the beginnings of our internal Index refactoring, and some bug fixes.

Thank you to all 16 contributors!

Anderson Banihirwe, Andrew, Benoit Bovy, Brewster Malevich, Giacomo Caria, Illviljan, James Bourbeau, Keewis, Maximilian Roos, Ravin Kumar, Stephan Hoyer, Thomas Nicholas, Tom Nicholas, Zachary Moon.

New Features#

Implement DataArray.drop_duplicates() to remove duplicate dimension values (PR5239). By Andrew Huang.
Allow passing combine_attrs strategy names to the keep_attrs parameter of apply_ufunc() (PR5041) By Justus Magin.
Dataset.interp() now allows interpolation with non-numerical datatypes, such as booleans, instead of dropping them. (GH4761 PR5008). By Jimmy Westling.
Raise more informative error when decoding time variables with invalid reference dates. (GH5199, PR5288). By Giacomo Caria.

Bug fixes#

Opening netCDF files from a path that doesn’t end in .nc without supplying an explicit engine works again (GH5295), fixing a bug introduced in 0.18.0. By Stephan Hoyer

Documentation#

Clean up and enhance docstrings for the DataArray.plot and Dataset.plot.* families of methods (PR5285). By Zach Moon.
Explanation of deprecation cycles and how to implement them added to contributors guide. (PR5289) By Tom Nicholas.

Internal Changes#

Explicit indexes refactor: add an xarray.Index base class and Dataset.xindexes / DataArray.xindexes properties. Also rename PandasIndexAdapter to PandasIndex, which now inherits from xarray.Index (PR5102). By Benoit Bovy.
Replace SortedKeysDict with python’s dict, given dicts are now ordered. By Maximilian Roos.
Updated the release guide for developers. Now accounts for actions that are automated via github actions. (PR5274). By Tom Nicholas.

v0.18.0 (6 May 2021)#

This release brings a few important performance improvements, a wide range of usability upgrades, lots of bug fixes, and some new features. These include a plugin API to add backend engines, a new theme for the documentation, curve fitting methods, and several new plotting functions.

Many thanks to the 38 contributors to this release: Aaron Spring, Alessandro Amici, Alex Marandon, Alistair Miles, Ana Paula Krelling, Anderson Banihirwe, Aureliana Barghini, Baudouin Raoult, Benoit Bovy, Blair Bonnett, David Trémouilles, Deepak Cherian, Gabriel Medeiros Abrahão, Giacomo Caria, Hauke Schulz, Illviljan, Mathias Hauser, Matthias Bussonnier, Mattia Almansi, Maximilian Roos, Ray Bell, Richard Kleijn, Ryan Abernathey, Sam Levang, Spencer Clark, Spencer Jones, Tammas Loughran, Tobias Kölling, Todd, Tom Nicholas, Tom White, Victor Negîrneac, Xianxiang Li, Zeb Nicholls, crusaderky, dschwoerer, johnomotani, keewis

New Features#

apply combine_attrs on data variables and coordinate variables when concatenating and merging datasets and dataarrays (PR4902). By Justus Magin.
Add Dataset.to_pandas() (PR5247) By Giacomo Caria.
Add DataArray.plot.surface() which wraps matplotlib’s plot_surface to make surface plots (GH2235 GH5084 PR5101). By John Omotani.
Allow passing multiple arrays to Dataset.__setitem__() (PR5216). By Giacomo Caria.
Add ‘cumulative’ option to Dataset.integrate() and DataArray.integrate() so that result is a cumulative integral, like scipy.integrate.cumulative_trapezoidal() (PR5153). By John Omotani.
Add safe_chunks option to Dataset.to_zarr() which allows overriding checks made to ensure Dask and Zarr chunk compatibility (GH5056). By Ryan Abernathey
Add Dataset.query() and DataArray.query() which enable indexing of datasets and data arrays by evaluating query expressions against the values of the data variables (PR4984). By Alistair Miles.
Allow passing combine_attrs to Dataset.merge() (PR4895). By Justus Magin.
Support for dask.graph_manipulation (requires dask >=2021.3) By Guido Imperiale
Add Dataset.plot.streamplot() for streamplot plots with Dataset variables (PR5003). By John Omotani.
Many of the arguments for the DataArray.str methods now support providing an array-like input. In this case, the array provided to the arguments is broadcast against the original array and applied elementwise.
DataArray.str now supports +, *, and % operators. These behave the same as they do for str, except that they follow array broadcasting rules.
A large number of new DataArray.str methods were implemented, DataArray.str.casefold(), DataArray.str.cat(), DataArray.str.extract(), DataArray.str.extractall(), DataArray.str.findall(), DataArray.str.format(), DataArray.str.get_dummies(), DataArray.str.islower(), DataArray.str.join(), DataArray.str.normalize(), DataArray.str.partition(), DataArray.str.rpartition(), DataArray.str.rsplit(), and DataArray.str.split(). A number of these methods allow for splitting or joining the strings in an array. (GH4622) By Todd Jennings
Thanks to the new pluggable backend infrastructure external packages may now use the xarray.backends entry point to register additional engines to be used in open_dataset(), see the documentation in How to add a new backend (GH4309, GH4803, PR4989, PR4810 and many others). The backend refactor has been sponsored with the “Essential Open Source Software for Science” grant from the Chan Zuckerberg Initiative and developed by B-Open. By Aureliana Barghini and Alessandro Amici.
date added (GH4983, PR4994). By Hauke Schulz.
Implement __getitem__ for both DatasetGroupBy and DataArrayGroupBy, inspired by pandas’ get_group(). By Deepak Cherian.
Switch the tutorial functions to use pooch (which is now a optional dependency) and add tutorial.open_rasterio() as a way to open example rasterio files (GH3986, PR4102, PR5074). By Justus Magin.
Add typing information to unary and binary arithmetic operators operating on Dataset, DataArray, Variable, DatasetGroupBy or DataArrayGroupBy (PR4904). By Richard Kleijn.
Add a combine_attrs parameter to open_mfdataset() (PR4971). By Justus Magin.
Enable passing arrays with a subset of dimensions to DataArray.clip() & Dataset.clip(); these methods now use xarray.apply_ufunc(); (PR5184). By Maximilian Roos.
Disable the cfgrib backend if the eccodes library is not installed (PR5083). By Baudouin Raoult.
Added DataArray.curvefit() and Dataset.curvefit() for general curve fitting applications. (GH4300, PR4849) By Sam Levang.
Add options to control expand/collapse of sections in display of Dataset and DataArray. The function set_options() now takes keyword arguments display_expand_attrs, display_expand_coords, display_expand_data, display_expand_data_vars, all of which can be one of True to always expand, False to always collapse, or default to expand unless over a pre-defined limit (PR5126). By Tom White.
Significant speedups in Dataset.interp() and DataArray.interp(). (GH4739, PR4740). By Deepak Cherian.
Prevent passing concat_dim to xarray.open_mfdataset() when combine='by_coords' is specified, which should never have been possible (as xarray.combine_by_coords() has no concat_dim argument to pass to). Also removes unneeded internal reordering of datasets in xarray.open_mfdataset() when combine='by_coords' is specified. Fixes (GH5230). By Tom Nicholas.
Implement __setitem__ for xarray.core.indexing.DaskIndexingAdapter if dask version supports item assignment. (GH5171, PR5174) By Tammas Loughran.

Breaking changes#

The minimum versions of some dependencies were changed:

Package

Old

New

boto3

1.12

1.13

cftime

1.0

1.1

dask

2.11

2.15

distributed

2.11

2.15

matplotlib

3.1

3.2

numba

0.48

0.49
open_dataset() and open_dataarray() now accept only the first argument as positional, all others need to be passed are keyword arguments. This is part of the refactor to support external backends (GH4309, PR4989). By Alessandro Amici.
Functions that are identities for 0d data return the unchanged data if axis is empty. This ensures that Datasets where some variables do not have the averaged dimensions are not accidentally changed (GH4885, PR5207). By David Schwörer.
DataArray.coarsen and Dataset.coarsen no longer support passing keep_attrs via its constructor. Pass keep_attrs via the applied function, i.e. use ds.coarsen(...).mean(keep_attrs=False) instead of ds.coarsen(..., keep_attrs=False).mean(). Further, coarsen now keeps attributes per default (PR5227). By Mathias Hauser.
switch the default of the merge() combine_attrs parameter to "override". This will keep the current behavior for merging the attrs of variables but stop dropping the attrs of the main objects (PR4902). By Justus Magin.

Deprecations#

Warn when passing concat_dim to xarray.open_mfdataset() when combine='by_coords' is specified, which should never have been possible (as xarray.combine_by_coords() has no concat_dim argument to pass to). Also removes unneeded internal reordering of datasets in xarray.open_mfdataset() when combine='by_coords' is specified. Fixes (GH5230), via (PR5231, PR5255). By Tom Nicholas.
The lock keyword argument to open_dataset() and open_dataarray() is now a backend specific option. It will give a warning if passed to a backend that doesn’t support it instead of being silently ignored. From the next version it will raise an error. This is part of the refactor to support external backends (GH5073). By Tom Nicholas and Alessandro Amici.

Bug fixes#

Properly support DataArray.ffill(), DataArray.bfill(), Dataset.ffill(), Dataset.bfill() along chunked dimensions. (GH2699). By Deepak Cherian.
Fix 2d plot failure for certain combinations of dimensions when x is 1d and y is 2d (GH5097, PR5099). By John Omotani.
Ensure standard calendar times encoded with large values (i.e. greater than approximately 292 years), can be decoded correctly without silently overflowing (PR5050). This was a regression in xarray 0.17.0. By Zeb Nicholls.
Added support for numpy.bool_ attributes in roundtrips using h5netcdf engine with invalid_netcdf=True [which casts bool s to numpy.bool_] (GH4981, PR4986). By Victor Negîrneac.
Don’t allow passing axis to Dataset.reduce() methods (GH3510, PR4940). By Justus Magin.
Decode values as signed if attribute _Unsigned = "false" (GH4954) By Tobias Kölling.
Keep coords attributes when interpolating when the indexer is not a Variable. (GH4239, GH4839 PR5031) By Jimmy Westling.
Ensure standard calendar dates encoded with a calendar attribute with some or all uppercase letters can be decoded or encoded to or from np.datetime64[ns] dates with or without cftime installed (GH5093, PR5180). By Spencer Clark.
Warn on passing keep_attrs to resample and rolling_exp as they are ignored, pass keep_attrs to the applied function instead (PR5265). By Mathias Hauser.

Documentation#

New section on How to add a new backend in the “Internals” chapter aimed to backend developers (GH4803, PR4810). By Aureliana Barghini.
Add Dataset.polyfit() and DataArray.polyfit() under “See also” in the docstrings of Dataset.polyfit() and DataArray.polyfit() (GH5016, PR5020). By Aaron Spring.
New sphinx theme & rearrangement of the docs (PR4835). By Anderson Banihirwe.

Internal Changes#

Enable displaying mypy error codes and ignore only specific error codes using # type: ignore[error-code] (PR5096). By Mathias Hauser.
Replace uses of raises_regex with the more standard pytest.raises(Exception, match="foo"); (PR5188), (PR5191). By Maximilian Roos.

v0.17.0 (24 Feb 2021)#

This release brings a few important performance improvements, a wide range of usability upgrades, lots of bug fixes, and some new features. These include better cftime support, a new quiver plot, better unstack performance, more efficient memory use in rolling operations, and some python packaging improvements. We also have a few documentation improvements (and more planned!).

Many thanks to the 36 contributors to this release: Alessandro Amici, Anderson Banihirwe, Aureliana Barghini, Ayrton Bourn, Benjamin Bean, Blair Bonnett, Chun Ho Chow, DWesl, Daniel Mesejo-León, Deepak Cherian, Eric Keenan, Illviljan, Jens Hedegaard Nielsen, Jody Klymak, Julien Seguinot, Julius Busecke, Kai Mühlbauer, Leif Denby, Martin Durant, Mathias Hauser, Maximilian Roos, Michael Mann, Ray Bell, RichardScottOZ, Spencer Clark, Tim Gates, Tom Nicholas, Yunus Sevinchan, alexamici, aurghs, crusaderky, dcherian, ghislainp, keewis, rhkleijn

Breaking changes#

xarray no longer supports python 3.6

The minimum version policy was changed to also apply to projects with irregular releases. As a result, the minimum versions of some dependencies have changed:

Package	Old	New
Python	3.6	3.7
setuptools	38.4	40.4
numpy	1.15	1.17
pandas	0.25	1.0
dask	2.9	2.11
distributed	2.9	2.11
bottleneck	1.2	1.3
h5netcdf	0.7	0.8
iris	2.2	2.4
netcdf4	1.4	1.5
pseudonetcdf	3.0	3.1
rasterio	1.0	1.1
scipy	1.3	1.4
seaborn	0.9	0.10
zarr	2.3	2.4

(GH4688, PR4720, PR4907, PR4942)

As a result of PR4684 the default units encoding for datetime-like values (np.datetime64[ns] or cftime.datetime) will now always be set such that int64 values can be used. In the past, no units finer than “seconds” were chosen, which would sometimes mean that float64 values were required, which would lead to inaccurate I/O round-trips.
Variables referred to in attributes like bounds and grid_mapping can be set as coordinate variables. These attributes are moved to DataArray.encoding from DataArray.attrs. This behaviour is controlled by the decode_coords kwarg to open_dataset() and open_mfdataset(). The full list of decoded attributes is in Weather and climate data (PR2844, GH3689)
As a result of PR4911 the output from calling DataArray.sum() or DataArray.prod() on an integer array with skipna=True and a non-None value for min_count will now be a float array rather than an integer array.

Deprecations#

dim argument to DataArray.integrate() is being deprecated in favour of a coord argument, for consistency with Dataset.integrate(). For now using dim issues a FutureWarning. It will be removed in version 0.19.0 (PR3993). By Tom Nicholas.
Deprecated autoclose kwargs from open_dataset() are removed (PR4725). By Aureliana Barghini.
the return value of Dataset.update() is being deprecated to make it work more like dict.update(). It will be removed in version 0.19.0 (PR4932). By Justus Magin.

New Features#

cftime_range() and DataArray.resample() now support millisecond ("L" or "ms") and microsecond ("U" or "us") frequencies for cftime.datetime coordinates (GH4097, PR4758). By Spencer Clark.
Significantly higher unstack performance on numpy-backed arrays which contain missing values; 8x faster than previous versions in our benchmark, and now 2x faster than pandas (PR4746). By Maximilian Roos.
Add Dataset.plot.quiver() for quiver plots with Dataset variables. By Deepak Cherian.
Add "drop_conflicts" to the strategies supported by the combine_attrs kwarg (GH4749, PR4827). By Justus Magin.
Allow installing from git archives (PR4897). By Justus Magin.
DataArrayCoarsen and DatasetCoarsen now implement a reduce method, enabling coarsening operations with custom reduction functions (GH3741, PR4939). By Spencer Clark.
Most rolling operations use significantly less memory. (GH4325). By Deepak Cherian.
Add Dataset.drop_isel() and DataArray.drop_isel() (GH4658, PR4819). By Daniel Mesejo.
Xarray now leverages updates as of cftime version 1.4.1, which enable exact I/O roundtripping of cftime.datetime objects (PR4758). By Spencer Clark.
open_dataset() and open_mfdataset() now accept fsspec URLs (including globs for the latter) for engine="zarr", and so allow reading from many remote and other file systems (PR4461) By Martin Durant
DataArray.swap_dims() & Dataset.swap_dims() now accept dims in the form of kwargs as well as a dict, like most similar methods. By Maximilian Roos.

Bug fixes#

Use specific type checks in xarray.core.variable.as_compatible_data instead of blanket access to values attribute (GH2097) By Yunus Sevinchan.
DataArray.resample() and Dataset.resample() do not trigger computations anymore if Dataset.weighted() or DataArray.weighted() are applied (GH4625, PR4668). By Julius Busecke.
merge() with combine_attrs='override' makes a copy of the attrs (GH4627).
By default, when possible, xarray will now always use values of type int64 when encoding and decoding numpy.datetime64[ns] datetimes. This ensures that maximum precision and accuracy are maintained in the round-tripping process (GH4045, PR4684). It also enables encoding and decoding standard calendar dates with time units of nanoseconds (PR4400). By Spencer Clark and Mark Harfouche.
DataArray.astype(), Dataset.astype() and Variable.astype() support the order and subok parameters again. This fixes a regression introduced in version 0.16.1 (GH4644, PR4683). By Richard Kleijn .
Remove dictionary unpacking when using .loc to avoid collision with .sel parameters (PR4695). By Anderson Banihirwe.
Fix the legend created by Dataset.plot.scatter() (GH4641, PR4723). By Justus Magin.
Fix a crash in orthogonal indexing on geographic coordinates with engine='cfgrib' (GH4733 PR4737). By Alessandro Amici.
Coordinates with dtype str or bytes now retain their dtype on many operations, e.g. reindex, align, concat, assign, previously they were cast to an object dtype (GH2658 and GH4543). By Mathias Hauser.
Limit number of data rows when printing large datasets. (GH4736, PR4750). By Jimmy Westling.
Add missing_dims parameter to transpose (GH4647, PR4767). By Daniel Mesejo.
Resolve intervals before appending other metadata to labels when plotting (GH4322, PR4794). By Justus Magin.
Fix regression when decoding a variable with a scale_factor and add_offset given as a list of length one (GH4631). By Mathias Hauser.
Expand user directory paths (e.g. ~/) in open_mfdataset() and Dataset.to_zarr() (GH4783, PR4795). By Julien Seguinot.
Raise DeprecationWarning when trying to typecast a tuple containing a DataArray. User now prompted to first call .data on it (GH4483). By Chun Ho Chow.
Ensure that Dataset.interp() raises ValueError when interpolating outside coordinate range and bounds_error=True (GH4854, PR4855). By Leif Denby.
Fix time encoding bug associated with using cftime versions greater than 1.4.0 with xarray (GH4870, PR4871). By Spencer Clark.
Stop DataArray.sum() and DataArray.prod() computing lazy arrays when called with a min_count parameter (GH4898, PR4911). By Blair Bonnett.
Fix bug preventing the min_count parameter to DataArray.sum() and DataArray.prod() working correctly when calculating over all axes of a float64 array (GH4898, PR4911). By Blair Bonnett.
Fix decoding of vlen strings using h5py versions greater than 3.0.0 with h5netcdf backend (GH4570, PR4893). By Kai Mühlbauer.
Allow converting Dataset or DataArray objects with a MultiIndex and at least one other dimension to a pandas object (GH3008, PR4442). By ghislainp.

Documentation#

Add information about requirements for accessor classes (GH2788, PR4657). By Justus Magin.
Start a list of external I/O integrating with xarray (GH683, PR4566). By Justus Magin.
Add concat examples and improve combining documentation (GH4620, PR4645). By Ray Bell and Justus Magin.
explicitly mention that Dataset.update() updates inplace (GH2951, PR4932). By Justus Magin.
Added docs on vectorized indexing (PR4711). By Eric Keenan.

Internal Changes#

Speed up of the continuous integration tests on azure.
- Switched to mamba and use matplotlib-base for a faster installation of all dependencies (PR4672).
- Use pytest.mark.skip instead of pytest.mark.xfail for some tests that can currently not succeed (PR4685).
- Run the tests in parallel using pytest-xdist (PR4694).
By Justus Magin and Mathias Hauser.
Use pyproject.toml instead of the setup_requires option for setuptools (PR4897). By Justus Magin.
Replace all usages of assert x.identical(y) with assert_identical(x, y) for clearer error messages (PR4752). By Maximilian Roos.
Speed up attribute style access (e.g. ds.somevar instead of ds["somevar"]) and tab completion in IPython (GH4741, PR4742). By Richard Kleijn.
Added the set_close method to Dataset and DataArray for backends to specify how to voluntary release all resources. (PR#4809) By Alessandro Amici.
Update type hints to work with numpy v1.20 (PR4878). By Mathias Hauser.
Ensure warnings cannot be turned into exceptions in testing.assert_equal() and the other assert_* functions (PR4864). By Mathias Hauser.
Performance improvement when constructing DataArrays. Significantly speeds up repr for Datasets with large number of variables. By Deepak Cherian.

v0.16.2 (30 Nov 2020)#

This release brings the ability to write to limited regions of zarr files, open zarr files with open_dataset() and open_mfdataset(), increased support for propagating attrs using the keep_attrs flag, as well as numerous bugfixes and documentation improvements.

Many thanks to the 31 contributors who contributed to this release: Aaron Spring, Akio Taniguchi, Aleksandar Jelenak, alexamici, Alexandre Poux, Anderson Banihirwe, Andrew Pauling, Ashwin Vishnu, aurghs, Brian Ward, Caleb, crusaderky, Dan Nowacki, darikg, David Brochart, David Huard, Deepak Cherian, Dion Häfner, Gerardo Rivera, Gerrit Holl, Illviljan, inakleinbottle, Jacob Tomlinson, James A. Bednar, jenssss, Joe Hamman, johnomotani, Joris Van den Bossche, Julia Kent, Julius Busecke, Kai Mühlbauer, keewis, Keisuke Fujii, Kyle Cranmer, Luke Volpatti, Mathias Hauser, Maximilian Roos, Michaël Defferrard, Michal Baumgartner, Nick R. Papior, Pascal Bourgault, Peter Hausamann, PGijsbers, Ray Bell, Romain Martinez, rpgoldman, Russell Manser, Sahid Velji, Samnan Rahee, Sander, Spencer Clark, Stephan Hoyer, Thomas Zilio, Tobias Kölling, Tom Augspurger, Wei Ji, Yash Saboo, Zeb Nicholls,

Deprecations#

weekofyear and week have been deprecated. Use DataArray.dt.isocalendar().week instead (PR4534). By Mathias Hauser. Maximilian Roos, and Spencer Clark.
DataArray.rolling and Dataset.rolling no longer support passing keep_attrs via its constructor. Pass keep_attrs via the applied function, i.e. use ds.rolling(...).mean(keep_attrs=False) instead of ds.rolling(..., keep_attrs=False).mean() Rolling operations now keep their attributes per default (PR4510). By Mathias Hauser.

New Features#

open_dataset() and open_mfdataset() now works with engine="zarr" (GH3668, PR4003, PR4187). By Miguel Jimenez and Wei Ji Leong.
Unary & binary operations follow the keep_attrs flag (GH3490, GH4065, GH3433, GH3595, PR4195). By Deepak Cherian.
Added isocalendar() that returns a Dataset with year, week, and weekday calculated according to the ISO 8601 calendar. Requires pandas version 1.1.0 or greater (PR4534). By Mathias Hauser, Maximilian Roos, and Spencer Clark.
Dataset.to_zarr() now supports a region keyword for writing to limited regions of existing Zarr stores (PR4035). See Modifying existing Zarr stores for full details. By Stephan Hoyer.
Added typehints in align() to reflect that the same type received in objects arg will be returned (PR4522). By Michal Baumgartner.
Dataset.weighted() and DataArray.weighted() are now executing value checks lazily if weights are provided as dask arrays (GH4541, PR4559). By Julius Busecke.
Added the keep_attrs keyword to rolling_exp.mean(); it now keeps attributes per default. By Mathias Hauser (PR4592).
Added freq as property to CFTimeIndex and into the CFTimeIndex.repr. (GH2416, PR4597) By Aaron Spring.

Bug fixes#

Fix bug where reference times without padded years (e.g. since 1-1-1) would lose their units when being passed by encode_cf_datetime (GH4422, PR4506). Such units are ambiguous about which digit represents the years (is it YMD or DMY?). Now, if such formatting is encountered, it is assumed that the first digit is the years, they are padded appropriately (to e.g. since 0001-1-1) and a warning that this assumption is being made is issued. Previously, without cftime, such times would be silently parsed incorrectly (at least based on the CF conventions) e.g. “since 1-1-1” would be parsed (via pandas and dateutil) to since 2001-1-1. By Zeb Nicholls.
Fix DataArray.plot.step(). By Deepak Cherian.
Fix bug where reading a scalar value from a NetCDF file opened with the h5netcdf backend would raise a ValueError when decode_cf=True (GH4471, PR4485). By Gerrit Holl.
Fix bug where datetime64 times are silently changed to incorrect values if they are outside the valid date range for ns precision when provided in some other units (GH4427, PR4454). By Andrew Pauling
Fix silently overwriting the engine key when passing open_dataset() a file object to an incompatible netCDF (GH4457). Now incompatible combinations of files and engines raise an exception instead. By Alessandro Amici.
The min_count argument to DataArray.sum() and DataArray.prod() is now ignored when not applicable, i.e. when skipna=False or when skipna=None and the dtype does not have a missing value (GH4352). By Mathias Hauser.
combine_by_coords() now raises an informative error when passing coordinates with differing calendars (GH4495). By Mathias Hauser.
DataArray.rolling and Dataset.rolling now also keep the attributes and names of of (wrapped) DataArray objects, previously only the global attributes were retained (GH4497, PR4510). By Mathias Hauser.
Improve performance where reading small slices from huge dimensions was slower than necessary (PR4560). By Dion Häfner.
Fix bug where dask_gufunc_kwargs was silently changed in apply_ufunc() (PR4576). By Kai Mühlbauer.

Documentation#

document the API not supported with duck arrays (PR4530). By Justus Magin.
Mention the possibility to pass functions to Dataset.where() or DataArray.where() in the parameter documentation (GH4223, PR4613). By Justus Magin.
Update the docstring of DataArray and Dataset. (PR4532); By Jimmy Westling.
Raise a more informative error when DataArray.to_dataframe() is is called on a scalar, (GH4228); By Pieter Gijsbers.
Fix grammar and typos in the Contributing to xarray guide (PR4545). By Sahid Velji.
Fix grammar and typos in the Reading and writing files guide (PR4553). By Sahid Velji.
Update link to NumPy docstring standard in the Contributing to xarray guide (PR4558). By Sahid Velji.
Add docstrings to isnull and notnull, and fix the displayed signature (GH2760, PR4618). By Justus Magin.

Internal Changes#

Optional dependencies can be installed along with xarray by specifying extras as pip install "xarray[extra]" where extra can be one of io, accel, parallel, viz and complete. See docs for updated installation instructions. (GH2888, PR4480). By Ashwin Vishnu, Justus Magin and Mathias Hauser.
Removed stray spaces that stem from black removing new lines (PR4504). By Mathias Hauser.
Ensure tests are not skipped in the py38-all-but-dask test environment (GH4509). By Mathias Hauser.
Ignore select numpy warnings around missing values, where xarray handles the values appropriately, (PR4536); By Maximilian Roos.
Replace the internal use of pd.Index.__or__ and pd.Index.__and__ with pd.Index.union and pd.Index.intersection as they will stop working as set operations in the future (GH4565). By Mathias Hauser.
Add GitHub action for running nightly tests against upstream dependencies (PR4583). By Anderson Banihirwe.
Ensure all figures are closed properly in plot tests (PR4600). By Yash Saboo, Nirupam K N and Mathias Hauser.

v0.16.1 (2020-09-20)#

This patch release fixes an incompatibility with a recent pandas change, which was causing an issue indexing with a datetime64. It also includes improvements to rolling, to_dataframe, cov & corr methods and bug fixes. Our documentation has a number of improvements, including fixing all doctests and confirming their accuracy on every commit.

Many thanks to the 36 contributors who contributed to this release:

Aaron Spring, Akio Taniguchi, Aleksandar Jelenak, Alexandre Poux, Caleb, Dan Nowacki, Deepak Cherian, Gerardo Rivera, Jacob Tomlinson, James A. Bednar, Joe Hamman, Julia Kent, Kai Mühlbauer, Keisuke Fujii, Mathias Hauser, Maximilian Roos, Nick R. Papior, Pascal Bourgault, Peter Hausamann, Romain Martinez, Russell Manser, Samnan Rahee, Sander, Spencer Clark, Stephan Hoyer, Thomas Zilio, Tobias Kölling, Tom Augspurger, alexamici, crusaderky, darikg, inakleinbottle, jenssss, johnomotani, keewis, and rpgoldman.

Breaking changes#

DataArray.astype() and Dataset.astype() now preserve attributes. Keep the old behavior by passing keep_attrs=False (GH2049, PR4314). By Dan Nowacki and Gabriel Joel Mitchell.

New Features#

rolling() and rolling() now accept more than 1 dimension. (PR4219) By Keisuke Fujii.
to_dataframe() and to_dataframe() now accept a dim_order parameter allowing to specify the resulting dataframe’s dimensions order (GH4331, PR4333). By Thomas Zilio.
Support multiple outputs in xarray.apply_ufunc() when using dask='parallelized'. (GH1815, PR4060). By Kai Mühlbauer.
min_count can be supplied to reductions such as .sum when specifying multiple dimension to reduce over; (PR4356). By Maximilian Roos.
xarray.cov() and xarray.corr() now handle missing values; (PR4351). By Maximilian Roos.
Add support for parsing datetime strings formatted following the default string representation of cftime objects, i.e. YYYY-MM-DD hh:mm:ss, in partial datetime string indexing, as well as cftime_range() (GH4337). By Spencer Clark.
Build CFTimeIndex.__repr__ explicitly as pandas.Index. Add calendar as a new property for CFTimeIndex and show calendar and length in CFTimeIndex.__repr__ (GH2416, PR4092) By Aaron Spring.
Use a wrapped array’s _repr_inline_ method to construct the collapsed repr of DataArray and Dataset objects and document the new method in Xarray Internals. (PR4248). By Justus Magin.
Allow per-variable fill values in most functions. (PR4237). By Justus Magin.
Expose use_cftime option in open_zarr() (GH2886, PR3229) By Samnan Rahee and Anderson Banihirwe.

Bug fixes#

Fix indexing with datetime64 scalars with pandas 1.1 (GH4283). By Stephan Hoyer and Justus Magin.
Variables which are chunked using dask only along some dimensions can be chunked while storing with zarr along previously unchunked dimensions (PR4312) By Tobias Kölling.
Fixed a bug in backend caused by basic installation of Dask (GH4164, PR4318) Sam Morley.
Fixed a few bugs with Dataset.polyfit() when encountering deficient matrix ranks (GH4190, PR4193). By Pascal Bourgault.
Fixed inconsistencies between docstring and functionality for DataArray.str.get() and DataArray.str.wrap() (GH4334). By Mathias Hauser.
Fixed overflow issue causing incorrect results in computing means of cftime.datetime arrays (GH4341). By Spencer Clark.
Fixed Dataset.coarsen(), DataArray.coarsen() dropping attributes on original object (GH4120, PR4360). By Julia Kent.
fix the signature of the plot methods. (PR4359) By Justus Magin.
Fix xarray.apply_ufunc() with vectorize=True and exclude_dims (GH3890). By Mathias Hauser.
Fix KeyError when doing linear interpolation to an nd DataArray that contains NaNs (PR4233). By Jens Svensmark
Fix incorrect legend labels for Dataset.plot.scatter() (GH4126). By Peter Hausamann.
Fix dask.optimize on DataArray producing an invalid Dask task graph (GH3698) By Tom Augspurger
Fix pip install . when no .git directory exists; namely when the xarray source directory has been rsync’ed by PyCharm Professional for a remote deployment over SSH. By Guido Imperiale
Preserve dimension and coordinate order during xarray.concat() (GH2811, GH4072, PR4419). By Kai Mühlbauer.
Avoid relying on set objects for the ordering of the coordinates (PR4409) By Justus Magin.

Documentation#

Update the docstring of DataArray.copy() to remove incorrect mention of ‘dataset’ (GH3606) By Sander van Rijn.
Removed skipna argument from DataArray.count(), DataArray.any(), DataArray.all(). (GH755) By Sander van Rijn
Update the contributing guide to use merges instead of rebasing and state that we squash-merge. (PR4355). By Justus Magin.
Make sure the examples from the docstrings actually work (PR4408). By Justus Magin.
Updated Vectorized Indexing to a clearer example. By Maximilian Roos

Internal Changes#

Fixed all doctests and enabled their running in CI. By Justus Magin.
Relaxed the Minimum dependency versions to support:
- all versions of setuptools released in the last 42 months (but no older than 38.4)
- all versions of dask and dask.distributed released in the last 12 months (but no older than 2.9)
- all versions of other packages released in the last 12 months
All are up from 6 months (GH4295) Guido Imperiale.
Use dask.array.apply_gufunc instead of dask.array.blockwise() in xarray.apply_ufunc() when using dask='parallelized'. (PR4060, PR4391, PR4392) By Kai Mühlbauer.
Align mypy versions to 0.782 across requirements and .pre-commit-config.yml files. (PR4390) By Maximilian Roos
Only load resource files when running inside a Jupyter Notebook (GH4294) By Guido Imperiale
Silenced most numpy warnings such as Mean of empty slice. (PR4369) By Maximilian Roos
Enable type checking for concat() (GH4238) By Mathias Hauser.
Updated plot functions for matplotlib version 3.3 and silenced warnings in the plot tests (PR4365). By Mathias Hauser.
Versions in pre-commit.yaml are now pinned, to reduce the chances of conflicting versions. (PR4388) By Maximilian Roos

v0.16.0 (2020-07-11)#

This release adds xarray.cov & xarray.corr for covariance & correlation respectively; the idxmax & idxmin methods, the polyfit method & xarray.polyval for fitting polynomials, as well as a number of documentation improvements, other features, and bug fixes. Many thanks to all 44 contributors who contributed to this release:

Akio Taniguchi, Andrew Williams, Aurélien Ponte, Benoit Bovy, Dave Cole, David Brochart, Deepak Cherian, Elliott Sales de Andrade, Etienne Combrisson, Hossein Madadi, Huite, Joe Hamman, Kai Mühlbauer, Keisuke Fujii, Maik Riechert, Marek Jacob, Mathias Hauser, Matthieu Ancellin, Maximilian Roos, Noah D Brenowitz, Oriol Abril, Pascal Bourgault, Phillip Butcher, Prajjwal Nijhara, Ray Bell, Ryan Abernathey, Ryan May, Spencer Clark, Spencer Hill, Srijan Saurav, Stephan Hoyer, Taher Chegini, Todd, Tom Nicholas, Yohai Bar Sinai, Yunus Sevinchan, arabidopsis, aurghs, clausmichele, dmey, johnomotani, keewis, raphael dussin, risebell

Breaking changes#

Minimum supported versions for the following packages have changed: dask >=2.9, distributed>=2.9. By Deepak Cherian
groupby operations will restore coord dimension order. Pass restore_coord_dims=False to revert to previous behavior.
DataArray.transpose() will now transpose coordinates by default. Pass transpose_coords=False to revert to previous behaviour. By Maximilian Roos
Alternate draw styles for plot.step() must be passed using the drawstyle (or ds) keyword argument, instead of the linestyle (or ls) keyword argument, in line with the upstream change in Matplotlib. (PR3274) By Elliott Sales de Andrade
The old auto_combine function has now been removed in favour of the combine_by_coords() and combine_nested() functions. This also means that the default behaviour of open_mfdataset() has changed to use combine='by_coords' as the default argument value. (GH2616, PR3926) By Tom Nicholas.
The DataArray and Variable HTML reprs now expand the data section by default (GH4176) By Stephan Hoyer.

New Features#

DataArray.argmin() and DataArray.argmax() now support sequences of ‘dim’ arguments, and if a sequence is passed return a dict (which can be passed to DataArray.isel() to get the value of the minimum) of the indices for each dimension of the minimum or maximum of a DataArray. (PR3936) By John Omotani, thanks to Keisuke Fujii for work in PR1469.
Added xarray.cov() and xarray.corr() (GH3784, PR3550, PR4089). By Andrew Williams and Robin Beer.
Implement DataArray.idxmax(), DataArray.idxmin(), Dataset.idxmax(), Dataset.idxmin(). (GH60, PR3871) By Todd Jennings
Added DataArray.polyfit() and xarray.polyval() for fitting polynomials. (GH3349, PR3733, PR4099) By Pascal Bourgault.
Added xarray.infer_freq() for extending frequency inferring to CFTime indexes and data (PR4033). By Pascal Bourgault.
chunks='auto' is now supported in the chunks argument of Dataset.chunk(). (GH4055) By Andrew Williams
Control over attributes of result in merge(), concat(), combine_by_coords() and combine_nested() using combine_attrs keyword argument. (GH3865, PR3877) By John Omotani
missing_dims argument to Dataset.isel(), DataArray.isel() and Variable.isel() to allow replacing the exception when a dimension passed to isel is not present with a warning, or just ignore the dimension. (GH3866, PR3923) By John Omotani
Support dask handling for DataArray.idxmax(), DataArray.idxmin(), Dataset.idxmax(), Dataset.idxmin(). (PR3922, PR4135) By Kai Mühlbauer and Pascal Bourgault.
More support for unit aware arrays with pint (PR3643, PR3975, PR4163) By Justus Magin.
Support overriding existing variables in to_zarr() with mode='a' even without append_dim, as long as dimension sizes do not change. By Stephan Hoyer.
Allow plotting of boolean arrays. (PR3766) By Marek Jacob
Enable using MultiIndex levels as coordinates in 1D and 2D plots (GH3927). By Mathias Hauser.
A days_in_month accessor for xarray.CFTimeIndex, analogous to the days_in_month accessor for a pandas.DatetimeIndex, which returns the days in the month each datetime in the index. Now days in month weights for both standard and non-standard calendars can be obtained using the DatetimeAccessor (PR3935). This feature requires cftime version 1.1.0 or greater. By Spencer Clark.
For the netCDF3 backend, added dtype coercions for unsigned integer types. (GH4014, PR4018) By Yunus Sevinchan
map_blocks() now accepts a template kwarg. This allows use cases where the result of a computation could not be inferred automatically. By Deepak Cherian
map_blocks() can now handle dask-backed xarray objects in args. (PR3818) By Deepak Cherian
Add keyword decode_timedelta to xarray.open_dataset(), (xarray.open_dataarray(), xarray.open_dataarray(), xarray.decode_cf()) that allows to disable/enable the decoding of timedeltas independently of time decoding (GH1621) Aureliana Barghini

Enhancements#

Performance improvement of DataArray.interp() and Dataset.interp() We performs independent interpolation sequentially rather than interpolating in one large multidimensional space. (GH2223) By Keisuke Fujii.
DataArray.interp() now support interpolations over chunked dimensions (PR4155). By Alexandre Poux.
Major performance improvement for Dataset.from_dataframe() when the dataframe has a MultiIndex (PR4184). By Stephan Hoyer. - DataArray.reset_index() and Dataset.reset_index() now keep coordinate attributes (PR4103). By Oriol Abril.
Axes kwargs such as facecolor can now be passed to DataArray.plot() in subplot_kws. This works for both single axes plots and FacetGrid plots. By Raphael Dussin.
Array items with long string reprs are now limited to a reasonable width (PR3900) By Maximilian Roos
Large arrays whose numpy reprs would have greater than 40 lines are now limited to a reasonable length. (PR3905) By Maximilian Roos

Bug fixes#

Fix errors combining attrs in open_mfdataset() (GH4009, PR4173) By John Omotani
If groupby receives a DataArray with name=None, assign a default name (GH158) By Phil Butcher.
Support dark mode in VS code (GH4024) By Keisuke Fujii.
Fix bug when converting multiindexed pandas objects to sparse xarray objects. (GH4019) By Deepak Cherian.
ValueError is raised when fill_value is not a scalar in full_like(). (GH3977) By Huite Bootsma.
Fix wrong order in converting a pd.Series with a MultiIndex to DataArray. (GH3951, GH4186) By Keisuke Fujii and Stephan Hoyer.
Fix renaming of coords when one or more stacked coords is not in sorted order during stack+groupby+apply operations. (GH3287, PR3906) By Spencer Hill
Fix a regression where deleting a coordinate from a copied DataArray can affect the original DataArray. (GH3899, PR3871) By Todd Jennings
Fix FacetGrid plots with a single contour. (GH3569, PR3915). By Deepak Cherian
Use divergent colormap if levels spans 0. (GH3524) By Deepak Cherian
Fix FacetGrid when vmin == vmax. (GH3734) By Deepak Cherian
Fix plotting when levels is a scalar and norm is provided. (GH3735) By Deepak Cherian
Fix bug where plotting line plots with 2D coordinates depended on dimension order. (GH3933) By Tom Nicholas.
Fix RasterioDeprecationWarning when using a vrt in open_rasterio. (GH3964) By Taher Chegini.
Fix AttributeError on displaying a Variable in a notebook context. (GH3972, PR3973) By Ian Castleden.
Fix bug causing DataArray.interpolate_na() to always drop attributes, and added keep_attrs argument. (GH3968) By Tom Nicholas.
Fix bug in time parsing failing to fall back to cftime. This was causing time variables with a time unit of 'msecs' to fail to parse. (PR3998) By Ryan May.
Fix weighted mean when passing boolean weights (GH4074). By Mathias Hauser.
Fix html repr in untrusted notebooks: fallback to plain text repr. (PR4053) By Benoit Bovy.
Fix DataArray.to_unstacked_dataset() for single-dimension variables. (GH4049) By Deepak Cherian
Fix open_rasterio() for WarpedVRT with specified src_crs. (PR4104) By Dave Cole.

Documentation#

update the docstring of DataArray.assign_coords() : clarify how to add a new coordinate to an existing dimension and illustrative example (GH3952, PR3958) By Etienne Combrisson.
update the docstring of Dataset.diff() and DataArray.diff() so it does document the dim parameter as required. (GH1040, PR3909) By Justus Magin.
Updated Calculating Seasonal Averages from Timeseries of Monthly Means example notebook to take advantage of the new days_in_month accessor for xarray.CFTimeIndex (PR3935). By Spencer Clark.
Updated the list of current core developers. (GH3892) By Tom Nicholas.
Add example for multi-dimensional extrapolation and note different behavior of kwargs in Dataset.interp() and DataArray.interp() for 1-d and n-d interpolation (PR3956). By Matthias Riße.
Apply black to all the code in the documentation (PR4012) By Justus Magin.
Narrative documentation now describes map_blocks(): Parallelize custom functions with apply_ufunc and map_blocks. By Deepak Cherian.
Document .plot, .dt, .str accessors the way they are called. (GH3625, PR3988) By Justus Magin.
Add documentation for the parameters and return values of DataArray.sel(). By Justus Magin.

Internal Changes#

Raise more informative error messages for chunk size conflicts when writing to zarr files. By Deepak Cherian.
Run the isort pre-commit hook only on python source files and update the flake8 version. (GH3750, PR3711) By Justus Magin.
Add blackdoc to the list of checkers for development. (PR4177) By Justus Magin.
Add a CI job that runs the tests with every optional dependency except dask. (GH3794, PR3919) By Justus Magin.
Use async / await for the asynchronous distributed tests. (GH3987, PR3989) By Justus Magin.
Various internal code clean-ups (PR4026, PR4038). By Prajjwal Nijhara.

v0.15.1 (23 Mar 2020)#

This release brings many new features such as Dataset.weighted() methods for weighted array reductions, a new jupyter repr by default, and the start of units integration with pint. There’s also the usual batch of usability improvements, documentation additions, and bug fixes.

Breaking changes#

Raise an error when assigning to the .values or .data attribute of dimension coordinates i.e. IndexVariable objects. This has been broken since v0.12.0. Please use DataArray.assign_coords() or Dataset.assign_coords() instead. (GH3470, PR3862) By Deepak Cherian

New Features#

Weighted array reductions are now supported via the new DataArray.weighted() and Dataset.weighted() methods. See Weighted array reductions. (GH422, PR2922). By Mathias Hauser.
The new jupyter notebook repr (Dataset._repr_html_ and DataArray._repr_html_) (introduced in 0.14.1) is now on by default. To disable, use xarray.set_options(display_style="text"). By Julia Signell.
Added support for pandas.DatetimeIndex-style rounding of cftime.datetime objects directly via a CFTimeIndex or via the DatetimeAccessor. By Spencer Clark
Support new h5netcdf backend keyword phony_dims (available from h5netcdf v0.8.0 for H5NetCDFStore. By Kai Mühlbauer.
Add partial support for unit aware arrays with pint. (PR3706, PR3611) By Justus Magin.
Dataset.groupby() and DataArray.groupby() now raise a TypeError on multiple string arguments. Receiving multiple string arguments often means a user is attempting to pass multiple dimensions as separate arguments and should instead pass a single list of dimensions. (PR3802) By Maximilian Roos
map_blocks() can now apply functions that add new unindexed dimensions. By Deepak Cherian
An ellipsis (...) is now supported in the dims argument of Dataset.stack() and DataArray.stack(), meaning all unlisted dimensions, similar to its meaning in DataArray.transpose(). (PR3826) By Maximilian Roos
Dataset.where() and DataArray.where() accept a lambda as a first argument, which is then called on the input; replicating pandas’ behavior. By Maximilian Roos.
skipna is available in Dataset.quantile(), DataArray.quantile(), core.groupby.DatasetGroupBy.quantile(), core.groupby.DataArrayGroupBy.quantile() (GH3843, PR3844) By Aaron Spring.
Add a diff summary for testing.assert_allclose. (GH3617, PR3847) By Justus Magin.

Bug fixes#

Fix Dataset.interp() when indexing array shares coordinates with the indexed variable (GH3252). By David Huard.
Fix recombination of groups in Dataset.groupby() and DataArray.groupby() when performing an operation that changes the size of the groups along the grouped dimension. By Eric Jansen.
Fix use of multi-index with categorical values (GH3674). By Matthieu Ancellin.
Fix alignment with join="override" when some dimensions are unindexed. (GH3681). By Deepak Cherian.
Fix Dataset.swap_dims() and DataArray.swap_dims() producing index with name reflecting the previous dimension name instead of the new one (GH3748, PR3752). By Joseph K Aicher.
Use dask_array_type instead of dask_array.Array for type checking. (GH3779, PR3787) By Justus Magin.
concat() can now handle coordinate variables only present in one of the objects to be concatenated when coords="different". By Deepak Cherian.
xarray now respects the over, under and bad colors if set on a provided colormap. (GH3590, PR3601) By johnomotani.
coarsen and rolling now respect xr.set_options(keep_attrs=True) to preserve attributes. Dataset.coarsen() accepts a keyword argument keep_attrs to change this setting. (GH3376, PR3801) By Andrew Thomas.
Delete associated indexes when deleting coordinate variables. (GH3746). By Deepak Cherian.
Fix Dataset.to_zarr() when using append_dim and group simultaneously. (GH3170). By Matthias Meyer.
Fix html repr on Dataset with non-string keys (PR3807). By Maximilian Roos.

Documentation#

Fix documentation of DataArray removing the deprecated mention that when omitted, dims are inferred from a coords-dict. (PR3821) By Sander van Rijn.
Improve the where() docstring. By Maximilian Roos
Update the installation instructions: only explicitly list recommended dependencies (GH3756). By Mathias Hauser.

Internal Changes#

Remove the internal import_seaborn function which handled the deprecation of the seaborn.apionly entry point (GH3747). By Mathias Hauser.
Don’t test pint integration in combination with datetime objects. (GH3778, PR3788) By Justus Magin.
Change test_open_mfdataset_list_attr to only run with dask installed (GH3777, PR3780). By Bruno Pagani.
Preserve the ability to index with method="nearest" with a CFTimeIndex with pandas versions greater than 1.0.1 (GH3751). By Spencer Clark.
Greater flexibility and improved test coverage of subtracting various types of objects from a CFTimeIndex. By Spencer Clark.
Update Azure CI MacOS image, given pending removal. By Maximilian Roos
Remove xfails for scipy 1.0.1 for tests that append to netCDF files (PR3805). By Mathias Hauser.
Remove conversion to pandas.Panel, given its removal in pandas in favor of xarray’s objects. By Maximilian Roos

v0.15.0 (30 Jan 2020)#

This release brings many improvements to xarray’s documentation: our examples are now binderized notebooks (click here) and we have new example notebooks from our SciPy 2019 sprint (many thanks to our contributors!).

This release also features many API improvements such as a new TimedeltaAccessor and support for CFTimeIndex in interpolate_na()); as well as many bug fixes.

Breaking changes#

Bumped minimum tested versions for dependencies:
- numpy 1.15
- pandas 0.25
- dask 2.2
- distributed 2.2
- scipy 1.3
Remove compat and encoding kwargs from DataArray, which have been deprecated since 0.12. (PR3650). Instead, specify the encoding kwarg when writing to disk or set the DataArray.encoding attribute directly. By Maximilian Roos.
xarray.dot(), DataArray.dot(), and the @ operator now use align="inner" (except when xarray.set_options(arithmetic_join="exact"); GH3694) by Mathias Hauser.

New Features#

Implement DataArray.pad() and Dataset.pad(). (GH2605, PR3596). By Mark Boer.
DataArray.sel() and Dataset.sel() now support pandas.CategoricalIndex. (GH3669) By Keisuke Fujii.
Support using an existing, opened h5netcdf File with H5NetCDFStore. This permits creating an Dataset from a h5netcdf File that has been opened using other means (GH3618). By Kai Mühlbauer.
Implement median and nanmedian for dask arrays. This works by rechunking to a single chunk along all reduction axes. (GH2999). By Deepak Cherian.
concat() now preserves attributes from the first Variable. (GH2575, GH2060, GH1614) By Deepak Cherian.
Dataset.quantile(), DataArray.quantile() and GroupBy.quantile now work with dask Variables. By Deepak Cherian.
Added the count reduction method to both DatasetCoarsen and DataArrayCoarsen objects. (PR3500) By Deepak Cherian
Add meta kwarg to apply_ufunc(); this is passed on to dask.array.blockwise(). (PR3660) By Deepak Cherian.
Add attrs_file option in open_mfdataset() to choose the source file for global attributes in a multi-file dataset (GH2382, PR3498). By Julien Seguinot.
Dataset.swap_dims() and DataArray.swap_dims() now allow swapping to dimension names that don’t exist yet. (PR3636) By Justus Magin.
Extend DatetimeAccessor properties and support .dt accessor for timedeltas via TimedeltaAccessor (PR3612) By Anderson Banihirwe.
Improvements to interpolating along time axes (GH3641, PR3631). By David Huard.
- Support CFTimeIndex in DataArray.interpolate_na()
- define 1970-01-01 as the default offset for the interpolation index for both pandas.DatetimeIndex and CFTimeIndex,
- use microseconds in the conversion from timedelta objects to floats to avoid overflow errors.

Bug fixes#

Applying a user-defined function that adds new dimensions using apply_ufunc() and vectorize=True now works with dask > 2.0. (GH3574, PR3660). By Deepak Cherian.
Fix combine_by_coords() to allow for combining incomplete hypercubes of Datasets (GH3648). By Ian Bolliger.
Fix combine_by_coords() when combining cftime coordinates which span long time intervals (GH3535). By Spencer Clark.
Fix plotting with transposed 2D non-dimensional coordinates. (GH3138, PR3441) By Deepak Cherian.
plot.FacetGrid.set_titles() can now replace existing row titles of a FacetGrid plot. In addition FacetGrid gained two new attributes: col_labels and row_labels contain matplotlib.text.Text handles for both column and row labels. These can be used to manually change the labels. By Deepak Cherian.
Fix issue with Dask-backed datasets raising a KeyError on some computations involving map_blocks() (PR3598). By Tom Augspurger.
Ensure Dataset.quantile(), DataArray.quantile() issue the correct error when q is out of bounds (GH3634) by Mathias Hauser.
Fix regression in xarray 0.14.1 that prevented encoding times with certain dtype, _FillValue, and missing_value encodings (GH3624). By Spencer Clark
Raise an error when trying to use Dataset.rename_dims() to rename to an existing name (GH3438, PR3645) By Justus Magin.
Dataset.rename(), DataArray.rename() now check for conflicts with MultiIndex level names.
Dataset.merge() no longer fails when passed a DataArray instead of a Dataset. By Tom Nicholas.
Fix a regression in Dataset.drop(): allow passing any iterable when dropping variables (GH3552, PR3693) By Justus Magin.
Fixed errors emitted by mypy --strict in modules that import xarray. (GH3695) by Guido Imperiale.
Allow plotting of binned coordinates on the y axis in plot.line() and plot.step() plots (GH3571, PR3685) by Julien Seguinot.
setuptools is now marked as a dependency of xarray (PR3628) by Richard Höchenberger.

Documentation#

Switch doc examples to use nbsphinx and replace sphinx_gallery scripts with Jupyter notebooks. (PR3105, PR3106, PR3121) By Ryan Abernathey.
Added example notebook demonstrating use of xarray with Regional Ocean Modeling System (ROMS) ocean hydrodynamic model output. (PR3116) By Robert Hetland.
Added example notebook demonstrating the visualization of ERA5 GRIB data. (PR3199) By Zach Bruick and Stephan Siemen.
Added examples for DataArray.quantile(), Dataset.quantile() and GroupBy.quantile. (PR3576) By Justus Magin.
Add new example notebook example notebook demonstrating vectorization of a 1D function using apply_ufunc() , dask and numba. By Deepak Cherian.
Added example for map_blocks(). (PR3667) By Riley X. Brady.

Internal Changes#

Make sure dask names change when rechunking by different chunk sizes. Conversely, make sure they stay the same when rechunking by the same chunk size. (GH3350) By Deepak Cherian.
2x to 5x speed boost (on small arrays) for Dataset.isel(), DataArray.isel(), and DataArray.__getitem__() when indexing by int, slice, list of int, scalar ndarray, or 1-dimensional ndarray. (PR3533) by Guido Imperiale.
Removed internal method Dataset._from_vars_and_coord_names, which was dominated by Dataset._construct_direct. (PR3565) By Maximilian Roos.
Replaced versioneer with setuptools-scm. Moved contents of setup.py to setup.cfg. Removed pytest-runner from setup.py, as per deprecation notice on the pytest-runner project. (PR3714) by Guido Imperiale.
Use of isort is now enforced by CI. (PR3721) by Guido Imperiale

v0.14.1 (19 Nov 2019)#

Breaking changes#

Broken compatibility with cftime < 1.0.3 . By Deepak Cherian.

Warning

cftime version 1.0.4 is broken (cftime/126); please use version 1.0.4.2 instead.
All leftover support for dates from non-standard calendars through netcdftime, the module included in versions of netCDF4 prior to 1.4 that eventually became the cftime package, has been removed in favor of relying solely on the standalone cftime package (PR3450). By Spencer Clark.

New Features#

Added the sparse option to unstack(), unstack(), reindex(), reindex() (GH3518). By Keisuke Fujii.
Added the fill_value option to DataArray.unstack() and Dataset.unstack() (GH3518, PR3541). By Keisuke Fujii.
Added the max_gap kwarg to interpolate_na() and interpolate_na(). This controls the maximum size of the data gap that will be filled by interpolation. By Deepak Cherian.
Added Dataset.drop_sel() & DataArray.drop_sel() for dropping labels. Dataset.drop_vars() & DataArray.drop_vars() have been added for dropping variables (including coordinates). The existing Dataset.drop() & DataArray.drop() methods remain as a backward compatible option for dropping either labels or variables, but using the more specific methods is encouraged. (PR3475) By Maximilian Roos
Added Dataset.map() & GroupBy.map & Resample.map for mapping / applying a function over each item in the collection, reflecting the widely used and least surprising name for this operation. The existing apply methods remain for backward compatibility, though using the map methods is encouraged. (PR3459) By Maximilian Roos
Dataset.transpose() and DataArray.transpose() now support an ellipsis (...) to represent all ‘other’ dimensions. For example, to move one dimension to the front, use .transpose('x', ...). (PR3421) By Maximilian Roos
Changed xr.ALL_DIMS to equal python’s Ellipsis (...), and changed internal usages to use ... directly. As before, you can use this to instruct a groupby operation to reduce over all dimensions. While we have no plans to remove xr.ALL_DIMS, we suggest using .... (PR3418) By Maximilian Roos
xarray.dot(), and DataArray.dot() now support the dims=... option to sum over the union of dimensions of all input arrays (GH3423) by Mathias Hauser.
Added new Dataset._repr_html_ and DataArray._repr_html_ to improve representation of objects in Jupyter. By default this feature is turned off for now. Enable it with xarray.set_options(display_style="html"). (PR3425) by Benoit Bovy and Julia Signell.
Implement dask deterministic hashing for xarray objects. Note that xarray objects with a dask.array backend already used deterministic hashing in previous releases; this change implements it when whole xarray objects are embedded in a dask graph, e.g. when DataArray.map_blocks() is invoked. (GH3378, PR3446, PR3515) By Deepak Cherian and Guido Imperiale.
Add the documented-but-missing quantile().
xarray now respects the DataArray.encoding["coordinates"] attribute when writing to disk. See Coordinates for more. (GH3351, PR3487) By Deepak Cherian.
Add the documented-but-missing quantile(). (GH3525, PR3527). By Justus Magin.

Bug fixes#

Ensure an index of type CFTimeIndex is not converted to a DatetimeIndex when calling Dataset.rename(), Dataset.rename_dims() and Dataset.rename_vars(). By Mathias Hauser. (GH3522).
Fix a bug in DataArray.set_index() in case that an existing dimension becomes a level variable of MultiIndex. (PR3520). By Keisuke Fujii.
Harmonize _FillValue, missing_value during encoding and decoding steps. (PR3502) By Anderson Banihirwe.
Fix regression introduced in v0.14.0 that would cause a crash if dask is installed but cloudpickle isn’t (GH3401) by Rhys Doyle
Fix grouping over variables with NaNs. (GH2383, PR3406). By Deepak Cherian.
Make alignment and concatenation significantly more efficient by using dask names to compare dask objects prior to comparing values after computation. This change makes it more convenient to carry around large non-dimensional coordinate variables backed by dask arrays. Existing workarounds involving reset_coords(drop=True) should now be unnecessary in most cases. (GH3068, GH3311, GH3454, PR3453). By Deepak Cherian.
Add support for cftime>=1.0.4. By Anderson Banihirwe.
Rolling reduction operations no longer compute dask arrays by default. (GH3161). In addition, the allow_lazy kwarg to reduce is deprecated. By Deepak Cherian.
Fix GroupBy.reduce when reducing over multiple dimensions. (GH3402). By Deepak Cherian
Allow appending datetime and bool data variables to zarr stores. (GH3480). By Akihiro Matsukawa.
Add support for numpy >=1.18 (); bugfix mean() on datetime64 arrays on dask backend (GH3409, PR3537). By Guido Imperiale.
Add support for pandas >=0.26 (GH3440). By Deepak Cherian.
Add support for pseudonetcdf >=3.1 (PR3485). By Barron Henderson.

Documentation#

Fix leap year condition in monthly means example. By Mickaël Lalande.
Fix the documentation of DataArray.resample() and Dataset.resample(), explicitly stating that a datetime-like dimension is required. (PR3400) By Justus Magin.
Update the Terminology page to address multidimensional coordinates. (PR3410) By Jon Thielen.
Fix the documentation of Dataset.integrate() and DataArray.integrate() and add an example to Dataset.integrate(). (PR3469) By Justus Magin.

Internal Changes#

Added integration tests against pint. (PR3238, PR3447, PR3493, PR3508) by Justus Magin.

Note

At the moment of writing, these tests as well as the ability to use pint in general require a highly experimental version of pint (install with pip install git+https://github.com/andrewgsavage/pint.git@refs/pull/6/head). Even with it, interaction with non-numpy array libraries, e.g. dask or sparse, is broken.
Use Python 3.6 idioms throughout the codebase. (PR3419) By Maximilian Roos
Run basic CI tests on Python 3.8. (PR3477) By Maximilian Roos
Enable type checking on default sentinel values (PR3472) By Maximilian Roos
Add Variable._replace for simpler replacing of a subset of attributes (PR3472) By Maximilian Roos

v0.14.0 (14 Oct 2019)#

Breaking changes#

This release introduces a rolling policy for minimum dependency versions: Minimum dependency versions.

Several minimum versions have been increased:

Package

Old

New

Python

3.5.3

3.6

numpy

1.12

1.14

pandas

0.19.2

0.24

dask

0.16 (tested: 2.4)

1.2

bottleneck

1.1 (tested: 1.2)

1.2

matplotlib

1.5 (tested: 3.1)

3.1

Obsolete patch versions (x.y.Z) are not tested anymore. The oldest supported versions of all optional dependencies are now covered by automated tests (before, only the very latest versions were tested).

(GH3222, GH3293, GH3340, GH3346, GH3358). By Guido Imperiale.
Dropped the drop=False optional parameter from Variable.isel(). It was unused and doesn’t make sense for a Variable. (PR3375). By Guido Imperiale.
Remove internal usage of collections.OrderedDict. After dropping support for Python <=3.5, most uses of OrderedDict in xarray were no longer necessary. We have removed the internal use of the OrderedDict in favor of Python’s builtin dict object which is now ordered itself. This change will be most obvious when interacting with the attrs property on Dataset and DataArray objects. (GH3380, PR3389). By Joe Hamman.

New functions/methods#

Added map_blocks(), modeled after dask.array.map_blocks(). Also added Dataset.unify_chunks(), DataArray.unify_chunks() and testing.assert_chunks_equal(). (PR3276). By Deepak Cherian and Guido Imperiale.

Enhancements#

core.groupby.GroupBy enhancements. By Deepak Cherian.
- Added a repr (PR3344). Example:
```
>>> da.groupby("time.season")
DataArrayGroupBy, grouped over 'season'
4 groups with labels 'DJF', 'JJA', 'MAM', 'SON'
```
- Added a GroupBy.dims property that mirrors the dimensions of each group (GH3344).
Speed up Dataset.isel() up to 33% and DataArray.isel() up to 25% for small arrays (GH2799, PR3375). By Guido Imperiale.

Bug fixes#

Reintroduce support for weakref (broken in v0.13.0). Support has been reinstated for DataArray and Dataset objects only. Internal xarray objects remain unaddressable by weakref in order to save memory (GH3317). By Guido Imperiale.
Line plots with the x or y argument set to a 1D non-dimensional coord now plot the correct data for 2D DataArrays (GH3334). By Tom Nicholas.
Make concat() more robust when merging variables present in some datasets but not others (GH508). By Deepak Cherian.
The default behaviour of reducing across all dimensions for DataArrayGroupBy objects has now been properly removed as was done for DatasetGroupBy in 0.13.0 (GH3337). Use xarray.ALL_DIMS if you need to replicate previous behaviour. Also raise nicer error message when no groups are created (GH1764). By Deepak Cherian.
Fix error in concatenating unlabeled dimensions (PR3362). By Deepak Cherian.
Warn if the dim kwarg is passed to rolling operations. This is redundant since a dimension is specified when the DatasetRolling or DataArrayRolling object is created. (PR3362). By Deepak Cherian.

Documentation#

Created a glossary of important xarray terms (GH2410, PR3352). By Gregory Gundersen.
Created a “How do I…” section (How do I …) for solutions to common questions. (PR3357). By Deepak Cherian.
Add examples for Dataset.swap_dims() and DataArray.swap_dims() (PR3331, PR3331). By Justus Magin.
Add examples for align(), merge(), combine_by_coords(), full_like(), zeros_like(), ones_like(), Dataset.pipe(), Dataset.assign(), Dataset.reindex(), Dataset.fillna() (PR3328). By Anderson Banihirwe.
Fixed documentation to clean up an unwanted file created in ipython example (PR3353). By Gregory Gundersen.

v0.13.0 (17 Sep 2019)#

This release includes many exciting changes: wrapping of NEP18 compliant numpy-like arrays; new scatter() plotting method that can scatter two DataArrays in a Dataset against each other; support for converting pandas DataFrames to xarray objects that wrap pydata/sparse; and more!

Breaking changes#

This release increases the minimum required Python version from 3.5.0 to 3.5.3 (GH3089). By Guido Imperiale.
The isel_points and sel_points methods are removed, having been deprecated since v0.10.0. These are redundant with the isel / sel methods. See Vectorized Indexing for the details By Maximilian Roos
The inplace kwarg for public methods now raises an error, having been deprecated since v0.11.0. By Maximilian Roos
concat() now requires the dim argument. Its indexers, mode and concat_over kwargs have now been removed. By Deepak Cherian
Passing a list of colors in cmap will now raise an error, having been deprecated since v0.6.1.
Most xarray objects now define __slots__. This reduces overall RAM usage by ~22% (not counting the underlying numpy buffers); on CPython 3.7/x64, a trivial DataArray has gone down from 1.9kB to 1.5kB.

Caveats:
- Pickle streams produced by older versions of xarray can’t be loaded using this release, and vice versa.
- Any user code that was accessing the __dict__ attribute of xarray objects will break. The best practice to attach custom metadata to xarray objects is to use the attrs dictionary.
- Any user code that defines custom subclasses of xarray classes must now explicitly define __slots__ itself. Subclasses that don’t add any attributes must state so by defining __slots__ = () right after the class header. Omitting __slots__ will now cause a FutureWarning to be logged, and will raise an error in a later release.
(GH3250) by Guido Imperiale.
The default dimension for Dataset.groupby(), Dataset.resample(), DataArray.groupby() and DataArray.resample() reductions is now the grouping or resampling dimension.
DataArray.to_dataset() requires name to be passed as a kwarg (previously ambiguous positional arguments were deprecated)
Reindexing with variables of a different dimension now raise an error (previously deprecated)
xarray.broadcast_array is removed (previously deprecated in favor of broadcast())
Variable.expand_dims is removed (previously deprecated in favor of Variable.set_dims())

New functions/methods#

xarray can now wrap around any NEP18 compliant numpy-like library (important: read notes about NUMPY_EXPERIMENTAL_ARRAY_FUNCTION in the above link). Added explicit test coverage for sparse. (GH3117, GH3202). This requires sparse>=0.8.0. By Nezar Abdennur and Guido Imperiale.
from_dataframe() and from_series() now support sparse=True for converting pandas objects into xarray objects wrapping sparse arrays. This is particularly useful with sparsely populated hierarchical indexes. (GH3206) By Stephan Hoyer.
The xarray package is now discoverable by mypy (although typing hints coverage is not complete yet). mypy type checking is now enforced by CI. Libraries that depend on xarray and use mypy can now remove from their setup.cfg the lines:
```
[mypy-xarray]
ignore_missing_imports = True
```
(GH2877, GH3088, GH3090, GH3112, GH3117, GH3207) By Guido Imperiale and Maximilian Roos.
Added DataArray.broadcast_like() and Dataset.broadcast_like(). By Deepak Cherian and David Mertz.
Dataset plotting API for visualizing dependencies between two DataArrays! Currently only Dataset.plot.scatter() is implemented. By Yohai Bar Sinai and Deepak Cherian
Added DataArray.head(), DataArray.tail() and DataArray.thin(); as well as Dataset.head(), Dataset.tail() and Dataset.thin() methods. (GH319) By Gerardo Rivera.

Enhancements#

Multiple enhancements to concat() and open_mfdataset(). By Deepak Cherian
- Added compat='override'. When merging, this option picks the variable from the first dataset and skips all comparisons.
- Added join='override'. When aligning, this only checks that index sizes are equal among objects and skips checking indexes for equality.
- concat() and open_mfdataset() now support the join kwarg. It is passed down to align().
- concat() now calls merge() on variables that are not concatenated (i.e. variables without concat_dim when data_vars or coords are "minimal"). concat() passes its new compat kwarg down to merge(). (GH2064)
Users can avoid a common bottleneck when using open_mfdataset() on a large number of files with variables that are known to be aligned and some of which need not be concatenated. Slow equality comparisons can now be avoided, for e.g.:
```
data = xr.open_mfdataset(files, concat_dim='time', data_vars='minimal',
                         coords='minimal', compat='override', join='override')
```
In to_zarr(), passing mode is not mandatory if append_dim is set, as it will automatically be set to 'a' internally. By David Brochart.
Added the ability to initialize an empty or full DataArray with a single value. (GH277) By Gerardo Rivera.
to_netcdf() now supports the invalid_netcdf kwarg when used with engine="h5netcdf". It is passed to h5netcdf.File. By Ulrich Herter.
xarray.Dataset.drop now supports keyword arguments; dropping index labels by using both dim and labels or using a DataArrayCoordinates object are deprecated (GH2910). By Gregory Gundersen.
Added examples of Dataset.set_index() and DataArray.set_index(), as well are more specific error messages when the user passes invalid arguments (GH3176). By Gregory Gundersen.
Dataset.filter_by_attrs() now filters the coordinates as well as the variables. By Spencer Jones.

Bug fixes#

Improve “missing dimensions” error message for apply_ufunc() (GH2078). By Rick Russotto.
assign_coords() now supports dictionary arguments (GH3231). By Gregory Gundersen.
Fix regression introduced in v0.12.2 where copy(deep=True) would convert unicode indices to dtype=object (GH3094). By Guido Imperiale.
Improved error handling and documentation for .expand_dims() read-only view.
Fix tests for big-endian systems (GH3125). By Graham Inggs.
XFAIL several tests which are expected to fail on ARM systems due to a datetime issue in NumPy (GH2334). By Graham Inggs.
Fix KeyError that arises when using .sel method with float values different from coords float type (GH3137). By Hasan Ahmad.
Fixed bug in combine_by_coords() causing a ValueError if the input had an unused dimension with coordinates which were not monotonic (GH3150). By Tom Nicholas.
Fixed crash when applying distributed.Client.compute() to a DataArray (GH3171). By Guido Imperiale.
Better error message when using groupby on an empty DataArray (GH3037). By Hasan Ahmad.
Fix error that arises when using open_mfdataset on a series of netcdf files having differing values for a variable attribute of type list. (GH3034) By Hasan Ahmad.
Prevent argmax() and argmin() from calling dask compute (GH3237). By Ulrich Herter.
Plots in 2 dimensions (pcolormesh, contour) now allow to specify levels as numpy array (GH3284). By Mathias Hauser.
Fixed bug in DataArray.quantile() failing to keep attributes when keep_attrs was True (GH3304). By David Huard.

Documentation#

Created a PR checklist as a quick reference for tasks before creating a new PR or pushing new commits. By Gregory Gundersen.
Fixed documentation to clean up unwanted files created in ipython examples (GH3227). By Gregory Gundersen.

v0.12.3 (10 July 2019)#

New functions/methods#

New methods Dataset.to_stacked_array() and DataArray.to_unstacked_dataset() for reshaping Datasets of variables with different dimensions (GH1317). This is useful for feeding data from xarray into machine learning models, as described in Stacking different variables together. By Noah Brenowitz.

Enhancements#

Support for renaming Dataset variables and dimensions independently with rename_vars() and rename_dims() (GH3026). By Julia Kent.
Add scales, offsets, units and descriptions attributes to DataArray returned by open_rasterio(). (GH3013) By Erle Carrara.

Bug fixes#

Resolved deprecation warnings from newer versions of matplotlib and dask.
Compatibility fixes for the upcoming pandas 0.25 and NumPy 1.17 releases. By Stephan Hoyer.
Fix summaries for multiindex coordinates (GH3079). By Jonas Hörsch.
Fix HDF5 error that could arise when reading multiple groups from a file at once (GH2954). By Stephan Hoyer.

v0.12.2 (29 June 2019)#

New functions/methods#

Two new functions, combine_nested() and combine_by_coords(), allow for combining datasets along any number of dimensions, instead of the one-dimensional list of datasets supported by concat().

The new combine_nested will accept the datasets as a nested list-of-lists, and combine by applying a series of concat and merge operations. The new combine_by_coords instead uses the dimension coordinates of datasets to order them.

open_mfdataset() can use either combine_nested or combine_by_coords to combine datasets along multiple dimensions, by specifying the argument combine='nested' or combine='by_coords'.

The older function auto_combine has been deprecated, because its functionality has been subsumed by the new functions. To avoid FutureWarnings switch to using combine_nested or combine_by_coords, (or set the combine argument in open_mfdataset). (GH2159) By Tom Nicholas.
rolling_exp() and rolling_exp() added, similar to pandas’ pd.DataFrame.ewm method. Calling .mean on the resulting object will return an exponentially weighted moving average. By Maximilian Roos.
New DataArray.str for string related manipulations, based on pandas.Series.str. By 0x0L.
Added strftime method to .dt accessor, making it simpler to hand a datetime DataArray to other code expecting formatted dates and times. (GH2090). strftime() is also now available on CFTimeIndex. By Alan Brammer and Ryan May.
GroupBy.quantile is now a method of GroupBy objects (GH3018). By David Huard.
Argument and return types are added to most methods on DataArray and Dataset, allowing static type checking both within xarray and external libraries. Type checking with mypy is enabled in CI (though not required yet). By Guido Imperiale and Maximilian Roos.

Enhancements to existing functionality#

Add keepdims argument for reduce operations (GH2170) By Scott Wales.
Enable @ operator for DataArray. This is equivalent to DataArray.dot() By Maximilian Roos.
Add fill_value argument for reindex, align, and merge operations to enable custom fill values. (GH2876) By Zach Griffith.
DataArray.transpose() now accepts a keyword argument transpose_coords which enables transposition of coordinates in the same way as Dataset.transpose(). DataArray.groupby() DataArray.groupby_bins(), and DataArray.resample() now accept a keyword argument restore_coord_dims which keeps the order of the dimensions of multi-dimensional coordinates intact (GH1856). By Peter Hausamann.
Clean up Python 2 compatibility in code (GH2950) By Guido Imperiale.
Better warning message when supplying invalid objects to xr.merge (GH2948). By Mathias Hauser.
Add errors keyword argument to Dataset.drop and Dataset.drop_dims() that allows ignoring errors if a passed label or dimension is not in the dataset (GH2994). By Andrew Ross.

Bug fixes#

Rolling operations on xarray objects containing dask arrays could silently compute the incorrect result or use large amounts of memory (GH2940). By Stephan Hoyer.
Don’t set encoding attributes on bounds variables when writing to netCDF. (GH2921) By Deepak Cherian.
NetCDF4 output: variables with unlimited dimensions must be chunked (not contiguous) on output. (GH1849) By James McCreight.
indexing with an empty list creates an object with zero-length axis (GH2882) By Mayeul d’Avezac.
Return correct count for scalar datetime64 arrays (GH2770) By Dan Nowacki.
Fixed max, min exception when applied to a multiIndex (GH2923) By Ian Castleden
A deep copy deep-copies the coords (GH1463) By Martin Pletcher.
Increased support for missing_value (GH2871) By Deepak Cherian.
Removed usages of pytest.config, which is deprecated (GH2988) By Maximilian Roos.
Fixed performance issues with cftime installed (GH3000) By 0x0L.
Replace incorrect usages of message in pytest assertions with match (GH3011) By Maximilian Roos.
Add explicit pytest markers, now required by pytest (GH3032). By Maximilian Roos.
Test suite fixes for newer versions of pytest (GH3011, GH3032). By Maximilian Roos and Stephan Hoyer.

v0.12.1 (4 April 2019)#

Enhancements#

Allow expand_dims method to support inserting/broadcasting dimensions with size > 1. (GH2710) By Martin Pletcher.

Bug fixes#

Dataset.copy(deep=True) now creates a deep copy of the attrs (GH2835). By Andras Gefferth.
Fix incorrect indexes resulting from various Dataset operations (e.g., swap_dims, isel, reindex, []) (GH2842, GH2856). By Stephan Hoyer.

v0.12.0 (15 March 2019)#

Highlights include:

Removed support for Python 2. This is the first version of xarray that is Python 3 only!
New coarsen() and integrate() methods. See Coarsen large arrays and Computation using Coordinates for details.
Many improvements to cftime support. See below for details.

Deprecations#

The compat argument to Dataset and the encoding argument to DataArray are deprecated and will be removed in a future release. (GH1188) By Maximilian Roos.

Other enhancements#

Added ability to open netcdf4/hdf5 file-like objects with open_dataset. Requires (h5netcdf>0.7 and h5py>2.9.0). (GH2781) By Scott Henderson
Add data=False option to to_dict() methods. (GH2656) By Ryan Abernathey
DataArray.coarsen() and Dataset.coarsen() are newly added. See Coarsen large arrays for details. (GH2525) By Keisuke Fujii.
Upsampling an array via interpolation with resample is now dask-compatible, as long as the array is not chunked along the resampling dimension. By Spencer Clark.
xarray.testing.assert_equal() and xarray.testing.assert_identical() now provide a more detailed report showing what exactly differs between the two objects (dimensions / coordinates / variables / attributes) (GH1507). By Benoit Bovy.
Add tolerance option to resample() methods bfill, pad, nearest. (GH2695) By Hauke Schulz.
DataArray.integrate() and Dataset.integrate() are newly added. See Computation using Coordinates for the detail. (GH1332) By Keisuke Fujii.
Added drop_dims() (GH1949). By Kevin Squire.

Bug fixes#

Silenced warnings that appear when using pandas 0.24. By Stephan Hoyer
Interpolating via resample now internally specifies bounds_error=False as an argument to scipy.interpolate.interp1d, allowing for interpolation from higher frequencies to lower frequencies. Datapoints outside the bounds of the original time coordinate are now filled with NaN (GH2197). By Spencer Clark.
Line plots with the x argument set to a non-dimensional coord now plot the correct data for 1D DataArrays. (GH2725). By Tom Nicholas.
Subtracting a scalar cftime.datetime object from a CFTimeIndex now results in a pandas.TimedeltaIndex instead of raising a TypeError (GH2671). By Spencer Clark.
backend_kwargs are no longer ignored when using open_dataset with pynio engine (:issue:’2380’) By Jonathan Joyce.
Fix open_rasterio creating a WKT CRS instead of PROJ.4 with rasterio 1.0.14+ (GH2715). By David Hoese.
Masking data arrays with xarray.DataArray.where() now returns an array with the name of the original masked array (GH2748 and GH2457). By Yohai Bar-Sinai.
Fixed error when trying to reduce a DataArray using a function which does not require an axis argument. (GH2768) By Tom Nicholas.
Concatenating a sequence of DataArray with varying names sets the name of the output array to None, instead of the name of the first input array. If the names are the same it sets the name to that, instead to the name of the first DataArray in the list as it did before. (GH2775). By Tom Nicholas.
Per the CF conventions section on calendars, specifying 'standard' as the calendar type in cftime_range() now correctly refers to the 'gregorian' calendar instead of the 'proleptic_gregorian' calendar (GH2761).

v0.11.3 (26 January 2019)#

Bug fixes#

Saving files with times encoded with reference dates with timezones (e.g. ‘2000-01-01T00:00:00-05:00’) no longer raises an error (GH2649). By Spencer Clark.
Fixed performance regression with open_mfdataset (GH2662). By Tom Nicholas.
Fixed supplying an explicit dimension in the concat_dim argument to to open_mfdataset (GH2647). By Ben Root.

v0.11.2 (2 January 2019)#

Removes inadvertently introduced setup dependency on pytest-runner (GH2641). Otherwise, this release is exactly equivalent to 0.11.1.

Warning

This is the last xarray release that will support Python 2.7. Future releases will be Python 3 only, but older versions of xarray will always be available for Python 2.7 users. For the more details, see:

v0.11.1 (29 December 2018)#

This minor release includes a number of enhancements and bug fixes, and two (slightly) breaking changes.

Breaking changes#

Minimum rasterio version increased from 0.36 to 1.0 (for open_rasterio)
Time bounds variables are now also decoded according to CF conventions (GH2565). The previous behavior was to decode them only if they had specific time attributes, now these attributes are copied automatically from the corresponding time coordinate. This might break downstream code that was relying on these variables to be brake downstream code that was relying on these variables to be not decoded. By Fabien Maussion.

Enhancements#

Ability to read and write consolidated metadata in zarr stores (GH2558). By Ryan Abernathey.
CFTimeIndex uses slicing for string indexing when possible (like pandas.DatetimeIndex), which avoids unnecessary copies. By Stephan Hoyer
Enable passing rasterio.io.DatasetReader or rasterio.vrt.WarpedVRT to open_rasterio instead of file path string. Allows for in-memory reprojection, see (GH2588). By Scott Henderson.
Like pandas.DatetimeIndex, CFTimeIndex now supports “dayofyear” and “dayofweek” accessors (GH2597). Note this requires a version of cftime greater than 1.0.2. By Spencer Clark.
The option 'warn_for_unclosed_files' (False by default) has been added to allow users to enable a warning when files opened by xarray are deallocated but were not explicitly closed. This is mostly useful for debugging; we recommend enabling it in your test suites if you use xarray for IO. By Stephan Hoyer
Support Dask HighLevelGraphs by Matthew Rocklin.
DataArray.resample() and Dataset.resample() now supports the loffset kwarg just like pandas. By Deepak Cherian
Datasets are now guaranteed to have a 'source' encoding, so the source file name is always stored (GH2550). By Tom Nicholas.
The apply methods for DatasetGroupBy, DataArrayGroupBy, DatasetResample and DataArrayResample now support passing positional arguments to the applied function as a tuple to the args argument. By Matti Eskelinen.
0d slices of ndarrays are now obtained directly through indexing, rather than extracting and wrapping a scalar, avoiding unnecessary copying. By Daniel Wennberg.
Added support for fill_value with shift() and shift() By Maximilian Roos

Bug fixes#

Ensure files are automatically closed, if possible, when no longer referenced by a Python variable (GH2560). By Stephan Hoyer
Fixed possible race conditions when reading/writing to disk in parallel (GH2595). By Stephan Hoyer
Fix h5netcdf saving scalars with filters or chunks (GH2563). By Martin Raspaud.
Fix parsing of _Unsigned attribute set by OPENDAP servers. (GH2583). By Deepak Cherian
Fix failure in time encoding when exporting to netCDF with versions of pandas less than 0.21.1 (GH2623). By Spencer Clark.
Fix MultiIndex selection to update label and level (GH2619). By Keisuke Fujii.

v0.11.0 (7 November 2018)#

Breaking changes#

Finished deprecations (changed behavior with this release):
- Dataset.T has been removed as a shortcut for Dataset.transpose(). Call Dataset.transpose() directly instead.
- Iterating over a Dataset now includes only data variables, not coordinates. Similarly, calling len and bool on a Dataset now includes only data variables.
- DataArray.__contains__ (used by Python’s in operator) now checks array data, not coordinates.
- The old resample syntax from before xarray 0.10, e.g., data.resample('1D', dim='time', how='mean'), is no longer supported will raise an error in most cases. You need to use the new resample syntax instead, e.g., data.resample(time='1D').mean() or data.resample({'time': '1D'}).mean().
New deprecations (behavior will be changed in xarray 0.12):
- Reduction of DataArray.groupby() and DataArray.resample() without dimension argument will change in the next release. Now we warn a FutureWarning. By Keisuke Fujii.
- The inplace kwarg of a number of DataArray and Dataset methods is being deprecated and will be removed in the next release. By Deepak Cherian.
Refactored storage backends:
- Xarray’s storage backends now automatically open and close files when necessary, rather than requiring opening a file with autoclose=True. A global least-recently-used cache is used to store open files; the default limit of 128 open files should suffice in most cases, but can be adjusted if necessary with xarray.set_options(file_cache_maxsize=...). The autoclose argument to open_dataset and related functions has been deprecated and is now a no-op.
  
  This change, along with an internal refactor of xarray’s storage backends, should significantly improve performance when reading and writing netCDF files with Dask, especially when working with many files or using Dask Distributed. By Stephan Hoyer
Support for non-standard calendars used in climate science:
- Xarray will now always use cftime.datetime objects, rather than by default trying to coerce them into np.datetime64[ns] objects. A CFTimeIndex will be used for indexing along time coordinates in these cases.
- A new method to_datetimeindex() has been added to aid in converting from a CFTimeIndex to a pandas.DatetimeIndex for the remaining use-cases where using a CFTimeIndex is still a limitation (e.g. for resample or plotting).
- Setting the enable_cftimeindex option is now a no-op and emits a FutureWarning.

Enhancements#

xarray.DataArray.plot.line() can now accept multidimensional coordinate variables as input. hue must be a dimension name in this case. (GH2407) By Deepak Cherian.
Added support for Python 3.7. (GH2271). By Joe Hamman.
Added support for plotting data with pandas.Interval coordinates, such as those created by groupby_bins() By Maximilian Maahn.
Added shift() for shifting the values of a CFTimeIndex by a specified frequency. (GH2244). By Spencer Clark.
Added support for using cftime.datetime coordinates with differentiate(), differentiate(), interp(), and interp(). By Spencer Clark
There is now a global option to either always keep or always discard dataset and dataarray attrs upon operations. The option is set with xarray.set_options(keep_attrs=True), and the default is to use the old behaviour. By Tom Nicholas.
Added a new backend for the GRIB file format based on ECMWF cfgrib python driver and ecCodes C-library. (GH2475) By Alessandro Amici, sponsored by ECMWF.
Resample now supports a dictionary mapping from dimension to frequency as its first argument, e.g., data.resample({'time': '1D'}).mean(). This is consistent with other xarray functions that accept either dictionaries or keyword arguments. By Stephan Hoyer.
The preferred way to access tutorial data is now to load it lazily with xarray.tutorial.open_dataset(). xarray.tutorial.load_dataset() calls Dataset.load() prior to returning (and is now deprecated). This was changed in order to facilitate using tutorial datasets with dask. By Joe Hamman.
DataArray can now use xr.set_option(keep_attrs=True) and retain attributes in binary operations, such as (+, -, * ,/). Default behaviour is unchanged (Attributes will be dismissed). By Michael Blaschek

Bug fixes#

FacetGrid now properly uses the cbar_kwargs keyword argument. (GH1504, GH1717) By Deepak Cherian.
Addition and subtraction operators used with a CFTimeIndex now preserve the index’s type. (GH2244). By Spencer Clark.
We now properly handle arrays of datetime.datetime and datetime.timedelta provided as coordinates. (GH2512) By Deepak Cherian.
xarray.DataArray.roll correctly handles multidimensional arrays. (GH2445) By Keisuke Fujii.
xarray.plot() now properly accepts a norm argument and does not override the norm’s vmin and vmax. (GH2381) By Deepak Cherian.
xarray.DataArray.std() now correctly accepts ddof keyword argument. (GH2240) By Keisuke Fujii.
Restore matplotlib’s default of plotting dashed negative contours when a single color is passed to DataArray.contour() e.g. colors='k'. By Deepak Cherian.
Fix a bug that caused some indexing operations on arrays opened with open_rasterio to error (GH2454). By Stephan Hoyer.
Subtracting one CFTimeIndex from another now returns a pandas.TimedeltaIndex, analogous to the behavior for DatetimeIndexes (GH2484). By Spencer Clark.
Adding a TimedeltaIndex to, or subtracting a TimedeltaIndex from a CFTimeIndex is now allowed (GH2484). By Spencer Clark.
Avoid use of Dask’s deprecated get= parameter in tests by Matthew Rocklin.
An OverflowError is now accurately raised and caught during the encoding process if a reference date is used that is so distant that the dates must be encoded using cftime rather than NumPy (GH2272). By Spencer Clark.
Chunked datasets can now roundtrip to Zarr storage continually with to_zarr and open_zarr (GH2300). By Lily Wang.

v0.10.9 (21 September 2018)#

This minor release contains a number of backwards compatible enhancements.

Announcements of note:

Xarray is now a NumFOCUS fiscally sponsored project! Read the announcement for more details.
We have a new Development roadmap that outlines our future development plans.
Dataset.apply now properly documents the way func is called. By Matti Eskelinen.

Enhancements#

differentiate() and differentiate() are newly added. (GH1332) By Keisuke Fujii.
Default colormap for sequential and divergent data can now be set via set_options() (GH2394) By Julius Busecke.
min_count option is newly supported in sum(), prod() and sum(), and prod(). (GH2230) By Keisuke Fujii.
plot() now accepts the kwargs xscale, yscale, xlim, ylim, xticks, yticks just like pandas. Also xincrease=False, yincrease=False now use matplotlib’s axis inverting methods instead of setting limits. By Deepak Cherian. (GH2224)
DataArray coordinates and Dataset coordinates and data variables are now displayed as a b ... y z rather than a b c d .... (GH1186) By Seth P.
A new CFTimeIndex-enabled cftime_range() function for use in generating dates from standard or non-standard calendars. By Spencer Clark.
When interpolating over a datetime64 axis, you can now provide a datetime string instead of a datetime64 object. E.g. da.interp(time='1991-02-01') (GH2284) By Deepak Cherian.
A clear error message is now displayed if a set or dict is passed in place of an array (GH2331) By Maximilian Roos.
Applying unstack to a large DataArray or Dataset is now much faster if the MultiIndex has not been modified after stacking the indices. (GH1560) By Maximilian Maahn.
You can now control whether or not to offset the coordinates when using the roll method and the current behavior, coordinates rolled by default, raises a deprecation warning unless explicitly setting the keyword argument. (GH1875) By Andrew Huang.
You can now call unstack without arguments to unstack every MultiIndex in a DataArray or Dataset. By Julia Signell.
Added the ability to pass a data kwarg to copy to create a new object with the same metadata as the original object but using new values. By Julia Signell.

Bug fixes#

xarray.plot.imshow() correctly uses the origin argument. (GH2379) By Deepak Cherian.
Fixed DataArray.to_iris() failure while creating DimCoord by falling back to creating AuxCoord. Fixed dependency on var_name attribute being set. (GH2201) By Thomas Voigt.
Fixed a bug in zarr backend which prevented use with datasets with invalid chunk size encoding after reading from an existing store (GH2278). By Joe Hamman.
Tests can be run in parallel with pytest-xdist By Tony Tung.
Follow up the renamings in dask; from dask.ghost to dask.overlap By Keisuke Fujii.
Now raises a ValueError when there is a conflict between dimension names and level names of MultiIndex. (GH2299) By Keisuke Fujii.
Follow up the renamings in dask; from dask.ghost to dask.overlap By Keisuke Fujii.
Now apply_ufunc() raises a ValueError when the size of input_core_dims is inconsistent with the number of arguments. (GH2341) By Keisuke Fujii.
Fixed Dataset.filter_by_attrs() behavior not matching netCDF4.Dataset.get_variables_by_attributes(). When more than one key=value is passed into Dataset.filter_by_attrs() it will now return a Dataset with variables which pass all the filters. (GH2315) By Andrew Barna.

v0.10.8 (18 July 2018)#

Breaking changes#

Xarray no longer supports python 3.4. Additionally, the minimum supported versions of the following dependencies has been updated and/or clarified:
- pandas: 0.18 -> 0.19
- NumPy: 1.11 -> 1.12
- Dask: 0.9 -> 0.16
- Matplotlib: unspecified -> 1.5
(GH2204). By Joe Hamman.

Enhancements#

interp_like() and interp_like() methods are newly added. (GH2218) By Keisuke Fujii.
Added support for curvilinear and unstructured generic grids to to_cdms2() and from_cdms2() (GH2262). By Stephane Raynaud.

Bug fixes#

Fixed a bug in zarr backend which prevented use with datasets with incomplete chunks in multiple dimensions (GH2225). By Joe Hamman.
Fixed a bug in to_netcdf() which prevented writing datasets when the arrays had different chunk sizes (GH2254). By Mike Neish.
Fixed masking during the conversion to cdms2 objects by to_cdms2() (GH2262). By Stephane Raynaud.
Fixed a bug in 2D plots which incorrectly raised an error when 2D coordinates weren’t monotonic (GH2250). By Fabien Maussion.
Fixed warning raised in to_netcdf() due to deprecation of effective_get in dask (GH2238). By Joe Hamman.

v0.10.7 (7 June 2018)#

Enhancements#

Plot labels now make use of metadata that follow CF conventions (GH2135). By Deepak Cherian and Ryan Abernathey.
Line plots now support facetting with row and col arguments (GH2107). By Yohai Bar Sinai.
interp() and interp() methods are newly added. See Interpolating data for the detail. (GH2079) By Keisuke Fujii.

Bug fixes#

Fixed a bug in rasterio backend which prevented use with distributed. The rasterio backend now returns pickleable objects (GH2021). By Joe Hamman.

v0.10.6 (31 May 2018)#

The minor release includes a number of bug-fixes and backwards compatible enhancements.

Enhancements#

New PseudoNetCDF backend for many Atmospheric data formats including GEOS-Chem, CAMx, NOAA arlpacked bit and many others. See io.PseudoNetCDF for more details. By Barron Henderson.
The Dataset constructor now aligns DataArray arguments in data_vars to indexes set explicitly in coords, where previously an error would be raised. (GH674) By Maximilian Roos.
sel(), isel() & reindex(), (and their Dataset counterparts) now support supplying a dict as a first argument, as an alternative to the existing approach of supplying kwargs. This allows for more robust behavior of dimension names which conflict with other keyword names, or are not strings. By Maximilian Roos.
rename() now supports supplying **kwargs, as an alternative to the existing approach of supplying a dict as the first argument. By Maximilian Roos.
cumsum() and cumprod() now support aggregation over multiple dimensions at the same time. This is the default behavior when dimensions are not specified (previously this raised an error). By Stephan Hoyer
DataArray.dot() and dot() are partly supported with older dask<0.17.4. (related to GH2203) By Keisuke Fujii.
Xarray now uses Versioneer to manage its version strings. (GH1300). By Joe Hamman.

Bug fixes#

Fixed a regression in 0.10.4, where explicitly specifying dtype='S1' or dtype=str in encoding with to_netcdf() raised an error (GH2149). Stephan Hoyer
apply_ufunc() now directly validates output variables (GH1931). By Stephan Hoyer.
Fixed a bug where to_netcdf(..., unlimited_dims='bar') yielded NetCDF files with spurious 0-length dimensions (i.e. b, a, and r) (GH2134). By Joe Hamman.
Removed spurious warnings with Dataset.update(Dataset) (GH2161) and array.equals(array) when array contains NaT (GH2162). By Stephan Hoyer.
Aggregations with Dataset.reduce() (including mean, sum, etc) no longer drop unrelated coordinates (GH1470). Also fixed a bug where non-scalar data-variables that did not include the aggregation dimension were improperly skipped. By Stephan Hoyer
Fix stack() with non-unique coordinates on pandas 0.23 (GH2160). By Stephan Hoyer
Selecting data indexed by a length-1 CFTimeIndex with a slice of strings now behaves as it does when using a length-1 DatetimeIndex (i.e. it no longer falsely returns an empty array when the slice includes the value in the index) (GH2165). By Spencer Clark.
Fix DataArray.groupby().reduce() mutating coordinates on the input array when grouping over dimension coordinates with duplicated entries (GH2153). By Stephan Hoyer
Fix Dataset.to_netcdf() cannot create group with engine="h5netcdf" (GH2177). By Stephan Hoyer

v0.10.4 (16 May 2018)#

The minor release includes a number of bug-fixes and backwards compatible enhancements. A highlight is CFTimeIndex, which offers support for non-standard calendars used in climate modeling.

Documentation#

New FAQ entry, Xarray related projects. By Deepak Cherian.
Assigning values with indexing now includes examples on how to select and assign values to a DataArray with .loc. By Chiara Lepore.

Enhancements#

Add an option for using a CFTimeIndex for indexing times with non-standard calendars and/or outside the Timestamp-valid range; this index enables a subset of the functionality of a standard pandas.DatetimeIndex. See Non-standard calendars and dates outside the precision range for full details. (GH789, GH1084, GH1252) By Spencer Clark with help from Stephan Hoyer.
Allow for serialization of cftime.datetime objects (GH789, GH1084, GH2008, GH1252) using the standalone cftime library. By Spencer Clark.
Support writing lists of strings as netCDF attributes (GH2044). By Dan Nowacki.
to_netcdf() with engine='h5netcdf' now accepts h5py encoding settings compression and compression_opts, along with the NetCDF4-Python style settings gzip=True and complevel. This allows using any compression plugin installed in hdf5, e.g. LZF (GH1536). By Guido Imperiale.
dot() on dask-backed data will now call dask.array.einsum(). This greatly boosts speed and allows chunking on the core dims. The function now requires dask >= 0.17.3 to work on dask-backed data (GH2074). By Guido Imperiale.
plot.line() learned new kwargs: xincrease, yincrease that change the direction of the respective axes. By Deepak Cherian.
Added the parallel option to open_mfdataset(). This option uses dask.delayed to parallelize the open and preprocessing steps within open_mfdataset. This is expected to provide performance improvements when opening many files, particularly when used in conjunction with dask’s multiprocessing or distributed schedulers (GH1981). By Joe Hamman.
New compute option in to_netcdf(), to_zarr(), and save_mfdataset() to allow for the lazy computation of netCDF and zarr stores. This feature is currently only supported by the netCDF4 and zarr backends. (GH1784). By Joe Hamman.

Bug fixes#

ValueError is raised when coordinates with the wrong size are assigned to a DataArray. (GH2112) By Keisuke Fujii.
Fixed a bug in rolling() with bottleneck. Also, fixed a bug in rolling an integer dask array. (GH2113) By Keisuke Fujii.
Fixed a bug where keep_attrs=True flag was neglected if apply_ufunc() was used with Variable. (GH2114) By Keisuke Fujii.
When assigning a DataArray to Dataset, any conflicted non-dimensional coordinates of the DataArray are now dropped. (GH2068) By Keisuke Fujii.
Better error handling in open_mfdataset (GH2077). By Stephan Hoyer.
plot.line() does not call autofmt_xdate() anymore. Instead it changes the rotation and horizontal alignment of labels without removing the x-axes of any other subplots in the figure (if any). By Deepak Cherian.
Colorbar limits are now determined by excluding ±Infs too. By Deepak Cherian. By Joe Hamman.
Fixed to_iris to maintain lazy dask array after conversion (GH2046). By Alex Hilson and Stephan Hoyer.

v0.10.3 (13 April 2018)#

The minor release includes a number of bug-fixes and backwards compatible enhancements.

Enhancements#

isin() and isin() methods, which test each value in the array for whether it is contained in the supplied list, returning a bool array. See Selecting values with isin for full details. Similar to the np.isin function. By Maximilian Roos.
Some speed improvement to construct DataArrayRolling object (GH1993) By Keisuke Fujii.
Handle variables with different values for missing_value and _FillValue by masking values for both attributes; previously this resulted in a ValueError. (GH2016) By Ryan May.

Bug fixes#

Fixed decode_cf function to operate lazily on dask arrays (GH1372). By Ryan Abernathey.
Fixed labeled indexing with slice bounds given by xarray objects with datetime64 or timedelta64 dtypes (GH1240). By Stephan Hoyer.
Attempting to convert an xarray.Dataset into a numpy array now raises an informative error message. By Stephan Hoyer.
Fixed a bug in decode_cf_datetime where int32 arrays weren’t parsed correctly (GH2002). By Fabien Maussion.
When calling xr.auto_combine() or xr.open_mfdataset() with a concat_dim, the resulting dataset will have that one-element dimension (it was silently dropped, previously) (GH1988). By Ben Root.

v0.10.2 (13 March 2018)#

The minor release includes a number of bug-fixes and enhancements, along with one possibly backwards incompatible change.

Backwards incompatible changes#

The addition of __array_ufunc__ for xarray objects (see below) means that NumPy ufunc methods (e.g., np.add.reduce) that previously worked on xarray.DataArray objects by converting them into NumPy arrays will now raise NotImplementedError instead. In all cases, the work-around is simple: convert your objects explicitly into NumPy arrays before calling the ufunc (e.g., with .values).

Enhancements#

Added dot(), equivalent to numpy.einsum(). Also, dot() now supports dims option, which specifies the dimensions to sum over. (GH1951) By Keisuke Fujii.
Support for writing xarray datasets to netCDF files (netcdf4 backend only) when using the dask.distributed scheduler (GH1464). By Joe Hamman.
Support lazy vectorized-indexing. After this change, flexible indexing such as orthogonal/vectorized indexing, becomes possible for all the backend arrays. Also, lazy transpose is now also supported. (GH1897) By Keisuke Fujii.
Implemented NumPy’s __array_ufunc__ protocol for all xarray objects (GH1617). This enables using NumPy ufuncs directly on xarray.Dataset objects with recent versions of NumPy (v1.13 and newer):
```
ds = xr.Dataset({"a": 1})
np.sin(ds)
```
This obliviates the need for the xarray.ufuncs module, which will be deprecated in the future when xarray drops support for older versions of NumPy. By Stephan Hoyer.
Improve rolling() logic. DataArrayRolling() object now supports construct() method that returns a view of the DataArray / Dataset object with the rolling-window dimension added to the last axis. This enables more flexible operation, such as strided rolling, windowed rolling, ND-rolling, short-time FFT and convolution. (GH1831, GH1142, GH819) By Keisuke Fujii.
line() learned to make plots with data on x-axis if so specified. (GH575) By Deepak Cherian.

Bug fixes#

Raise an informative error message when using apply_ufunc with numpy v1.11 (GH1956). By Stephan Hoyer.
Fix the precision drop after indexing datetime64 arrays (GH1932). By Keisuke Fujii.
Silenced irrelevant warnings issued by open_rasterio (GH1964). By Stephan Hoyer.
Fix kwarg colors clashing with auto-inferred cmap (GH1461) By Deepak Cherian.
Fix imshow() error when passed an RGB array with size one in a spatial dimension. By Zac Hatfield-Dodds.

v0.10.1 (25 February 2018)#

The minor release includes a number of bug-fixes and backwards compatible enhancements.

Documentation#

Added a new guide on Contributing to xarray (GH640) By Joe Hamman.
Added apply_ufunc example to Toy weather data (GH1844). By Liam Brannigan.
New entry Why don’t aggregations return Python scalars? in the Frequently Asked Questions (GH1726). By 0x0L.

Enhancements#

New functions and methods:

Added DataArray.to_iris() and DataArray.from_iris() for converting data arrays to and from Iris Cubes with the same data and coordinates (GH621 and GH37). By Neil Parley and Duncan Watson-Parris.
Experimental support for using Zarr as storage layer for xarray (GH1223). By Ryan Abernathey and Joe Hamman.
New rank() on arrays and datasets. Requires bottleneck (GH1731). By 0x0L.
.dt accessor can now ceil, floor and round timestamps to specified frequency. By Deepak Cherian.

Plotting enhancements:

xarray.plot.imshow() now handles RGB and RGBA images. Saturation can be adjusted with vmin and vmax, or with robust=True. By Zac Hatfield-Dodds.
contourf() learned to contour 2D variables that have both a 1D coordinate (e.g. time) and a 2D coordinate (e.g. depth as a function of time) (GH1737). By Deepak Cherian.
plot() rotates x-axis ticks if x-axis is time. By Deepak Cherian.
line() can draw multiple lines if provided with a 2D variable. By Deepak Cherian.

Other enhancements:

Reduce methods such as DataArray.sum() now handles object-type array.

da = xr.DataArray(np.array([True, False, np.nan], dtype=object), dims="x")
da.sum()

(GH1866) By Keisuke Fujii.

Reduce methods such as DataArray.sum() now accepts dtype arguments. (GH1838) By Keisuke Fujii.
Added nodatavals attribute to DataArray when using open_rasterio(). (GH1736). By Alan Snow.
Use pandas.Grouper class in xarray resample methods rather than the deprecated pandas.TimeGrouper class (GH1766). By Joe Hamman.
Experimental support for parsing ENVI metadata to coordinates and attributes in xarray.open_rasterio(). By Matti Eskelinen.
Reduce memory usage when decoding a variable with a scale_factor, by converting 8-bit and 16-bit integers to float32 instead of float64 (PR1840), and keeping float16 and float32 as float32 (GH1842). Correspondingly, encoded variables may also be saved with a smaller dtype. By Zac Hatfield-Dodds.
Speed of reindexing/alignment with dask array is orders of magnitude faster when inserting missing values (GH1847). By Stephan Hoyer.
Fix axis keyword ignored when applying np.squeeze to DataArray (GH1487). By Florian Pinault.
netcdf4-python has moved the its time handling in the netcdftime module to a standalone package (netcdftime). As such, xarray now considers netcdftime an optional dependency. One benefit of this change is that it allows for encoding/decoding of datetimes with non-standard calendars without the netcdf4-python dependency (GH1084). By Joe Hamman.

New functions/methods

New rank() on arrays and datasets. Requires bottleneck (GH1731). By 0x0L.

Bug fixes#

Rolling aggregation with center=True option now gives the same result with pandas including the last element (GH1046). By Keisuke Fujii.
Support indexing with a 0d-np.ndarray (GH1921). By Keisuke Fujii.
Added warning in api.py of a netCDF4 bug that occurs when the filepath has 88 characters (GH1745). By Liam Brannigan.
Fixed encoding of multi-dimensional coordinates in to_netcdf() (GH1763). By Mike Neish.
Fixed chunking with non-file-based rasterio datasets (GH1816) and refactored rasterio test suite. By Ryan Abernathey
Bug fix in open_dataset(engine=’pydap’) (GH1775) By Keisuke Fujii.
Bug fix in vectorized assignment (GH1743, GH1744). Now item assignment to __setitem__() checks
Bug fix in vectorized assignment (GH1743, GH1744). Now item assignment to DataArray.__setitem__() checks coordinates of target, destination and keys. If there are any conflict among these coordinates, IndexError will be raised. By Keisuke Fujii.
Properly point DataArray.__dask_scheduler__ to dask.threaded.get. By Matthew Rocklin.
Bug fixes in DataArray.plot.imshow(): all-NaN arrays and arrays with size one in some dimension can now be plotted, which is good for exploring satellite imagery (GH1780). By Zac Hatfield-Dodds.
Fixed UnboundLocalError when opening netCDF file (GH1781). By Stephan Hoyer.
The variables, attrs, and dimensions properties have been deprecated as part of a bug fix addressing an issue where backends were unintentionally loading the datastores data and attributes repeatedly during writes (GH1798). By Joe Hamman.
Compatibility fixes to plotting module for NumPy 1.14 and pandas 0.22 (GH1813). By Joe Hamman.
Bug fix in encoding coordinates with {'_FillValue': None} in netCDF metadata (GH1865). By Chris Roth.
Fix indexing with lists for arrays loaded from netCDF files with engine='h5netcdf (GH1864). By Stephan Hoyer.
Corrected a bug with incorrect coordinates for non-georeferenced geotiff files (GH1686). Internally, we now use the rasterio coordinate transform tool instead of doing the computations ourselves. A parse_coordinates kwarg has been added to open_rasterio() (set to True per default). By Fabien Maussion.
The colors of discrete colormaps are now the same regardless if seaborn is installed or not (GH1896). By Fabien Maussion.
Fixed dtype promotion rules in where() and concat() to match pandas (GH1847). A combination of strings/numbers or unicode/bytes now promote to object dtype, instead of strings or unicode. By Stephan Hoyer.
Fixed bug where isnull() was loading data stored as dask arrays (GH1937). By Joe Hamman.

v0.10.0 (20 November 2017)#

This is a major release that includes bug fixes, new features and a few backwards incompatible changes. Highlights include:

Indexing now supports broadcasting over dimensions, similar to NumPy’s vectorized indexing (but better!).
resample() has a new groupby-like API like pandas.
apply_ufunc() facilitates wrapping and parallelizing functions written for NumPy arrays.
Performance improvements, particularly for dask and open_mfdataset().

Breaking changes#

xarray now supports a form of vectorized indexing with broadcasting, where the result of indexing depends on dimensions of indexers, e.g., array.sel(x=ind) with ind.dims == ('y',). Alignment between coordinates on indexed and indexing objects is also now enforced. Due to these changes, existing uses of xarray objects to index other xarray objects will break in some cases.

The new indexing API is much more powerful, supporting outer, diagonal and vectorized indexing in a single interface. The isel_points and sel_points methods are deprecated, since they are now redundant with the isel / sel methods. See Vectorized Indexing for the details (GH1444, GH1436). By Keisuke Fujii and Stephan Hoyer.
A new resampling interface to match pandas’ groupby-like API was added to Dataset.resample() and DataArray.resample() (GH1272). Timeseries resampling is fully supported for data with arbitrary dimensions as is both downsampling and upsampling (including linear, quadratic, cubic, and spline interpolation).

Old syntax:
ds.resample("24H", dim="time", how="max")
New syntax:
ds.resample(time="24H").max()
Note that both versions are currently supported, but using the old syntax will produce a warning encouraging users to adopt the new syntax. By Daniel Rothenberg.
Calling repr() or printing xarray objects at the command line or in a Jupyter Notebook will not longer automatically compute dask variables or load data on arrays lazily loaded from disk (GH1522). By Guido Imperiale.
Supplying coords as a dictionary to the DataArray constructor without also supplying an explicit dims argument is no longer supported. This behavior was deprecated in version 0.9 but will now raise an error (GH727).
Several existing features have been deprecated and will change to new behavior in xarray v0.11. If you use any of them with xarray v0.10, you should see a FutureWarning that describes how to update your code:
- Dataset.T has been deprecated an alias for Dataset.transpose() (GH1232). In the next major version of xarray, it will provide short- cut lookup for variables or attributes with name 'T'.
- DataArray.__contains__ (e.g., key in data_array) currently checks for membership in DataArray.coords. In the next major version of xarray, it will check membership in the array data found in DataArray.values instead (GH1267).
- Direct iteration over and counting a Dataset (e.g., [k for k in ds], ds.keys(), ds.values(), len(ds) and if ds) currently includes all variables, both data and coordinates. For improved usability and consistency with pandas, in the next major version of xarray these will change to only include data variables (GH884). Use ds.variables, ds.data_vars or ds.coords as alternatives.
Changes to minimum versions of dependencies:
- Old numpy < 1.11 and pandas < 0.18 are no longer supported (GH1512). By Keisuke Fujii.
- The minimum supported version bottleneck has increased to 1.1 (GH1279). By Joe Hamman.

Enhancements#

New functions/methods

New helper function apply_ufunc() for wrapping functions written to work on NumPy arrays to support labels on xarray objects (GH770). apply_ufunc also support automatic parallelization for many functions with dask. See Wrapping custom computation and Parallelize custom functions with apply_ufunc and map_blocks for details. By Stephan Hoyer.
Added new method Dataset.to_dask_dataframe(), convert a dataset into a dask dataframe. This allows lazy loading of data from a dataset containing dask arrays (GH1462). By James Munroe.

New function where() for conditionally switching between values in xarray objects, like numpy.where():

import xarray as xr

arr = xr.DataArray([[1, 2, 3], [4, 5, 6]], dims=("x", "y"))

xr.where(arr % 2, "even", "odd")

<xarray.DataArray (x: 2, y: 3)>
array([['even', 'odd', 'even'],
       ['odd', 'even', 'odd']],
      dtype='<U4')
Dimensions without coordinates: x, y

Equivalently, the where() method also now supports the other argument, for filling with a value other than NaN (GH576). By Stephan Hoyer.

Added show_versions() function to aid in debugging (GH1485). By Joe Hamman.

Performance improvements

concat() was computing variables that aren’t in memory (e.g. dask-based) multiple times; open_mfdataset() was loading them multiple times from disk. Now, both functions will instead load them at most once and, if they do, store them in memory in the concatenated array/dataset (GH1521). By Guido Imperiale.
Speed-up (x 100) of xarray.conventions.decode_cf_datetime. By Christian Chwala.

IO related improvements

Unicode strings (str on Python 3) are now round-tripped successfully even when written as character arrays (e.g., as netCDF3 files or when using engine='scipy') (GH1638). This is controlled by the _Encoding attribute convention, which is also understood directly by the netCDF4-Python interface. See String encoding for full details. By Stephan Hoyer.
Support for data_vars and coords keywords from concat() added to open_mfdataset() (GH438). Using these keyword arguments can significantly reduce memory usage and increase speed. By Oleksandr Huziy.

Support for pathlib.Path objects added to open_dataset(), open_mfdataset(), xarray.to_netcdf, and save_mfdataset() (GH799):

from pathlib import Path  # In Python 2, use pathlib2!

data_dir = Path("data/")

one_file = data_dir / "dta_for_month_01.nc"

xr.open_dataset(one_file)

By Willi Rath.

You can now explicitly disable any default _FillValue (NaN for floating point values) by passing the encoding {'_FillValue': None} (GH1598). By Stephan Hoyer.
More attributes available in attrs dictionary when raster files are opened with open_rasterio(). By Greg Brener.
Support for NetCDF files using an _Unsigned attribute to indicate that a a signed integer data type should be interpreted as unsigned bytes (GH1444). By Eric Bruning.
Support using an existing, opened netCDF4 Dataset with NetCDF4DataStore. This permits creating an Dataset from a netCDF4 Dataset that has been opened using other means (GH1459). By Ryan May.
Changed PydapDataStore to take a Pydap dataset. This permits opening Opendap datasets that require authentication, by instantiating a Pydap dataset with a session object. Also added xarray.backends.PydapDataStore.open() which takes a url and session object (GH1068). By Philip Graae.
Support reading and writing unlimited dimensions with h5netcdf (GH1636). By Joe Hamman.

Other improvements

Added _ipython_key_completions_ to xarray objects, to enable autocompletion for dictionary-like access in IPython, e.g., ds['tem + tab -> ds['temperature'] (GH1628). By Keisuke Fujii.
Support passing keyword arguments to load, compute, and persist methods. Any keyword arguments supplied to these methods are passed on to the corresponding dask function (GH1523). By Joe Hamman.
Encoding attributes are now preserved when xarray objects are concatenated. The encoding is copied from the first object (GH1297). By Joe Hamman and Gerrit Holl.
Support applying rolling window operations using bottleneck’s moving window functions on data stored as dask arrays (GH1279). By Joe Hamman.
Experimental support for the Dask collection interface (GH1674). By Matthew Rocklin.

Bug fixes#

Suppress RuntimeWarning issued by numpy for “invalid value comparisons” (e.g. NaN). Xarray now behaves similarly to pandas in its treatment of binary and unary operations on objects with NaNs (GH1657). By Joe Hamman.
Unsigned int support for reduce methods with skipna=True (GH1562). By Keisuke Fujii.
Fixes to ensure xarray works properly with pandas 0.21:
- Fix isnull() method (GH1549).
- to_series() and to_dataframe() should not return a pandas.MultiIndex for 1D data (GH1548).
- Fix plotting with datetime64 axis labels (GH1661).
By Stephan Hoyer.
open_rasterio() method now shifts the rasterio coordinates so that they are centered in each pixel (GH1468). By Greg Brener.
rename() method now doesn’t throw errors if some Variable is renamed to the same name as another Variable as long as that other Variable is also renamed (GH1477). This method now does throw when two Variables would end up with the same name after the rename (since one of them would get overwritten in this case). By Prakhar Goel.
Fix xarray.testing.assert_allclose() to actually use atol and rtol arguments when called on DataArray objects (GH1488). By Stephan Hoyer.
xarray quantile methods now properly raise a TypeError when applied to objects with data stored as dask arrays (GH1529). By Joe Hamman.
Fix positional indexing to allow the use of unsigned integers (GH1405). By Joe Hamman and Gerrit Holl.
Creating a Dataset now raises MergeError if a coordinate shares a name with a dimension but is comprised of arbitrary dimensions (GH1120). By Joe Hamman.
open_rasterio() method now skips rasterio’s crs attribute if its value is None (GH1520). By Leevi Annala.
Fix xarray.DataArray.to_netcdf() to return bytes when no path is provided (GH1410). By Joe Hamman.
Fix xarray.save_mfdataset() to properly raise an informative error when objects other than Dataset are provided (GH1555). By Joe Hamman.
xarray.Dataset.copy() would not preserve the encoding property (GH1586). By Guido Imperiale.
xarray.concat() would eagerly load dask variables into memory if the first argument was a numpy variable (GH1588). By Guido Imperiale.
Fix bug in to_netcdf() when writing in append mode (GH1215). By Joe Hamman.
Fix netCDF4 backend to properly roundtrip the shuffle encoding option (GH1606). By Joe Hamman.
Fix bug when using pytest class decorators to skipping certain unittests. The previous behavior unintentionally causing additional tests to be skipped (GH1531). By Joe Hamman.
Fix pynio backend for upcoming release of pynio with Python 3 support (GH1611). By Ben Hillman.
Fix seaborn import warning for Seaborn versions 0.8 and newer when the apionly module was deprecated. (GH1633). By Joe Hamman.
Fix COMPAT: MultiIndex checking is fragile (GH1833). By Florian Pinault.
Fix rasterio backend for Rasterio versions 1.0alpha10 and newer. (GH1641). By Chris Holden.

Bug fixes after rc1#

Suppress warning in IPython autocompletion, related to the deprecation of .T attributes (GH1675). By Keisuke Fujii.
Fix a bug in lazily-indexing netCDF array. (GH1688) By Keisuke Fujii.
(Internal bug) MemoryCachedArray now supports the orthogonal indexing. Also made some internal cleanups around array wrappers (GH1429). By Keisuke Fujii.
(Internal bug) MemoryCachedArray now always wraps np.ndarray by NumpyIndexingAdapter. (GH1694) By Keisuke Fujii.
Fix importing xarray when running Python with -OO (GH1706). By Stephan Hoyer.
Saving a netCDF file with a coordinates with a spaces in its names now raises an appropriate warning (GH1689). By Stephan Hoyer.
Fix two bugs that were preventing dask arrays from being specified as coordinates in the DataArray constructor (GH1684). By Joe Hamman.
Fixed apply_ufunc with dask='parallelized' for scalar arguments (GH1697). By Stephan Hoyer.
Fix “Chunksize cannot exceed dimension size” error when writing netCDF4 files loaded from disk (GH1225). By Stephan Hoyer.
Validate the shape of coordinates with names matching dimensions in the DataArray constructor (GH1709). By Stephan Hoyer.
Raise NotImplementedError when attempting to save a MultiIndex to a netCDF file (GH1547). By Stephan Hoyer.
Remove netCDF dependency from rasterio backend tests. By Matti Eskelinen

Bug fixes after rc2#

Fixed unexpected behavior in Dataset.set_index() and DataArray.set_index() introduced by pandas 0.21.0. Setting a new index with a single variable resulted in 1-level pandas.MultiIndex instead of a simple pandas.Index (GH1722). By Benoit Bovy.
Fixed unexpected memory loading of backend arrays after print. (GH1720). By Keisuke Fujii.

v0.9.6 (8 June 2017)#

This release includes a number of backwards compatible enhancements and bug fixes.

Enhancements#

New sortby() method to Dataset and DataArray that enable sorting along dimensions (GH967). See the docs for examples. By Chun-Wei Yuan and Kyle Heuton.
Add .dt accessor to DataArrays for computing datetime-like properties for the values they contain, similar to pandas.Series (GH358). By Daniel Rothenberg.
Renamed internal dask arrays created by open_dataset to match new dask conventions (GH1343). By Ryan Abernathey.
as_variable() is now part of the public API (GH1303). By Benoit Bovy.
align() now supports join='exact', which raises an error instead of aligning when indexes to be aligned are not equal. By Stephan Hoyer.
New function open_rasterio() for opening raster files with the rasterio library. See the docs for details. By Joe Hamman, Nic Wayand and Fabien Maussion

Bug fixes#

Fix error from repeated indexing of datasets loaded from disk (GH1374). By Stephan Hoyer.
Fix a bug where .isel_points wrongly assigns unselected coordinate to data_vars. By Keisuke Fujii.
Tutorial datasets are now checked against a reference MD5 sum to confirm successful download (GH1392). By Matthew Gidden.
DataArray.chunk() now accepts dask specific kwargs like Dataset.chunk() does. By Fabien Maussion.
Support for engine='pydap' with recent releases of Pydap (3.2.2+), including on Python 3 (GH1174).

Documentation#

A new gallery allows to add interactive examples to the documentation. By Fabien Maussion.

Testing#

Fix test suite failure caused by changes to pandas.cut function (GH1386). By Ryan Abernathey.
Enhanced tests suite by use of @network decorator, which is controlled via --run-network-tests command line argument to py.test (GH1393). By Matthew Gidden.

v0.9.5 (17 April, 2017)#

Remove an inadvertently introduced print statement.

v0.9.3 (16 April, 2017)#

This minor release includes bug-fixes and backwards compatible enhancements.

Enhancements#

New persist() method to Datasets and DataArrays to enable persisting data in distributed memory when using Dask (GH1344). By Matthew Rocklin.
New expand_dims() method for DataArray and Dataset (GH1326). By Keisuke Fujii.

Bug fixes#

Fix .where() with drop=True when arguments do not have indexes (GH1350). This bug, introduced in v0.9, resulted in xarray producing incorrect results in some cases. By Stephan Hoyer.
Fixed writing to file-like objects with to_netcdf() (GH1320). Stephan Hoyer.
Fixed explicitly setting engine='scipy' with to_netcdf when not providing a path (GH1321). Stephan Hoyer.
Fixed open_dataarray does not pass properly its parameters to open_dataset (GH1359). Stephan Hoyer.
Ensure test suite works when runs from an installed version of xarray (GH1336). Use @pytest.mark.slow instead of a custom flag to mark slow tests. By Stephan Hoyer

v0.9.2 (2 April 2017)#

The minor release includes bug-fixes and backwards compatible enhancements.

Enhancements#

rolling on Dataset is now supported (GH859).
.rolling() on Dataset is now supported (GH859). By Keisuke Fujii.
When bottleneck version 1.1 or later is installed, use bottleneck for rolling var, argmin, argmax, and rank computations. Also, rolling median now accepts a min_periods argument (GH1276). By Joe Hamman.
When .plot() is called on a 2D DataArray and only one dimension is specified with x= or y=, the other dimension is now guessed (GH1291). By Vincent Noel.
Added new method assign_attrs() to DataArray and Dataset, a chained-method compatible implementation of the dict.update method on attrs (GH1281). By Henry S. Harrison.
Added new autoclose=True argument to open_mfdataset() to explicitly close opened files when not in use to prevent occurrence of an OS Error related to too many open files (GH1198). Note, the default is autoclose=False, which is consistent with previous xarray behavior. By Phillip J. Wolfram.
The repr() of Dataset and DataArray attributes uses a similar format to coordinates and variables, with vertically aligned entries truncated to fit on a single line (GH1319). Hopefully this will stop people writing data.attrs = {} and discarding metadata in notebooks for the sake of cleaner output. The full metadata is still available as data.attrs. By Zac Hatfield-Dodds.
Enhanced tests suite by use of @slow and @flaky decorators, which are controlled via --run-flaky and --skip-slow command line arguments to py.test (GH1336). By Stephan Hoyer and Phillip J. Wolfram.
New aggregation on rolling objects count() which providing a rolling count of valid values (GH1138).

Bug fixes#

Rolling operations now keep preserve original dimension order (GH1125). By Keisuke Fujii.
Fixed sel with method='nearest' on Python 2.7 and 64-bit Windows (GH1140). Stephan Hoyer.
Fixed where with drop='True' for empty masks (GH1341). By Stephan Hoyer and Phillip J. Wolfram.

v0.9.1 (30 January 2017)#

Renamed the “Unindexed dimensions” section in the Dataset and DataArray repr (added in v0.9.0) to “Dimensions without coordinates” (GH1199).

v0.9.0 (25 January 2017)#

This major release includes five months worth of enhancements and bug fixes from 24 contributors, including some significant changes that are not fully backwards compatible. Highlights include:

Coordinates are now optional in the xarray data model, even for dimensions.
Changes to caching, lazy loading and pickling to improve xarray’s experience for parallel computing.
Improvements for accessing and manipulating pandas.MultiIndex levels.
Many new methods and functions, including quantile(), cumsum(), cumprod() combine_first set_index(), reset_index(), reorder_levels(), full_like(), zeros_like(), ones_like() open_dataarray(), compute(), Dataset.info(), testing.assert_equal(), testing.assert_identical(), and testing.assert_allclose().

Breaking changes#

Index coordinates for each dimensions are now optional, and no longer created by default GH1017. You can identify such dimensions without coordinates by their appearance in list of “Dimensions without coordinates” in the Dataset or DataArray repr:
xr.Dataset({"foo": (("x", "y"), [[1, 2]])})
<xarray.Dataset> Dimensions: (x: 1, y: 2) Dimensions without coordinates: x, y Data variables: foo (x, y) int64 1 2
This has a number of implications:
- align() and reindex() can now error, if dimensions labels are missing and dimensions have different sizes.
- Because pandas does not support missing indexes, methods such as to_dataframe/from_dataframe and stack/unstack no longer roundtrip faithfully on all inputs. Use reset_index() to remove undesired indexes.
- Dataset.__delitem__ and drop() no longer delete/drop variables that have dimensions matching a deleted/dropped variable.
- DataArray.coords.__delitem__ is now allowed on variables matching dimension names.
- .sel and .loc now handle indexing along a dimension without coordinate labels by doing integer based indexing. See Missing coordinate labels for an example.
- indexes is no longer guaranteed to include all dimensions names as keys. The new method get_index() has been added to get an index for a dimension guaranteed, falling back to produce a default RangeIndex if necessary.
The default behavior of merge is now compat='no_conflicts', so some merges will now succeed in cases that previously raised xarray.MergeError. Set compat='broadcast_equals' to restore the previous default. See Merging with ‘no_conflicts’ for more details.
Reading values no longer always caches values in a NumPy array GH1128. Caching of .values on variables read from netCDF files on disk is still the default when open_dataset() is called with cache=True. By Guido Imperiale and Stephan Hoyer.
Pickling a Dataset or DataArray linked to a file on disk no longer caches its values into memory before pickling (GH1128). Instead, pickle stores file paths and restores objects by reopening file references. This enables preliminary, experimental use of xarray for opening files with dask.distributed. By Stephan Hoyer.
Coordinates used to index a dimension are now loaded eagerly into pandas.Index objects, instead of loading the values lazily. By Guido Imperiale.
Automatic levels for 2d plots are now guaranteed to land on vmin and vmax when these kwargs are explicitly provided (GH1191). The automated level selection logic also slightly changed. By Fabien Maussion.
DataArray.rename() behavior changed to strictly change the DataArray.name if called with string argument, or strictly change coordinate names if called with dict-like argument. By Markus Gonser.
By default to_netcdf() add a _FillValue = NaN attributes to float types. By Frederic Laliberte.
repr on DataArray objects uses an shortened display for NumPy array data that is less likely to overflow onto multiple pages (GH1207). By Stephan Hoyer.
xarray no longer supports python 3.3, versions of dask prior to v0.9.0, or versions of bottleneck prior to v1.0.

Deprecations#

Renamed the Coordinate class from xarray’s low level API to IndexVariable. Variable.to_variable and Variable.to_coord have been renamed to to_base_variable() and to_index_variable().
Deprecated supplying coords as a dictionary to the DataArray constructor without also supplying an explicit dims argument. The old behavior encouraged relying on the iteration order of dictionaries, which is a bad practice (GH727).
Removed a number of methods deprecated since v0.7.0 or earlier: load_data, vars, drop_vars, dump, dumps and the variables keyword argument to Dataset.
Removed the dummy module that enabled import xray.

Enhancements#

Added new method combine_first() to DataArray and Dataset, based on the pandas method of the same name (see Combine). By Chun-Wei Yuan.
Added the ability to change default automatic alignment (arithmetic_join=”inner”) for binary operations via set_options() (see Automatic alignment). By Chun-Wei Yuan.
Add checking of attr names and values when saving to netCDF, raising useful error messages if they are invalid. (GH911). By Robin Wilson.
Added ability to save DataArray objects directly to netCDF files using to_netcdf(), and to load directly from netCDF files using open_dataarray() (GH915). These remove the need to convert a DataArray to a Dataset before saving as a netCDF file, and deals with names to ensure a perfect ‘roundtrip’ capability. By Robin Wilson.
Multi-index levels are now accessible as “virtual” coordinate variables, e.g., ds['time'] can pull out the 'time' level of a multi-index (see Coordinates). sel also accepts providing multi-index levels as keyword arguments, e.g., ds.sel(time='2000-01') (see Multi-level indexing). By Benoit Bovy.
Added set_index, reset_index and reorder_levels methods to easily create and manipulate (multi-)indexes (see Set and reset index). By Benoit Bovy.
Added the compat option 'no_conflicts' to merge, allowing the combination of xarray objects with disjoint (GH742) or overlapping (GH835) coordinates as long as all present data agrees. By Johnnie Gray. See Merging with ‘no_conflicts’ for more details.
It is now possible to set concat_dim=None explicitly in open_mfdataset() to disable inferring a dimension along which to concatenate. By Stephan Hoyer.
Added methods DataArray.compute(), Dataset.compute(), and Variable.compute() as a non-mutating alternative to load(). By Guido Imperiale.
Adds DataArray and Dataset methods cumsum() and cumprod(). By Phillip J. Wolfram.
New properties Dataset.sizes and DataArray.sizes for providing consistent access to dimension length on both Dataset and DataArray (GH921). By Stephan Hoyer.
New keyword argument drop=True for sel(), isel() and squeeze() for dropping scalar coordinates that arise from indexing. DataArray (GH242). By Stephan Hoyer.
New top-level functions full_like(), zeros_like(), and ones_like() By Guido Imperiale.
Overriding a preexisting attribute with register_dataset_accessor() or register_dataarray_accessor() now issues a warning instead of raising an error (GH1082). By Stephan Hoyer.
Options for axes sharing between subplots are exposed to FacetGrid and plot(), so axes sharing can be disabled for polar plots. By Bas Hoonhout.
New utility functions assert_equal(), assert_identical(), and assert_allclose() for asserting relationships between xarray objects, designed for use in a pytest test suite.
figsize, size and aspect plot arguments are now supported for all plots (GH897). See Controlling the figure size for more details. By Stephan Hoyer and Fabien Maussion.
New info() method to summarize Dataset variables and attributes. The method prints to a buffer (e.g. stdout) with output similar to what the command line utility ncdump -h produces (GH1150). By Joe Hamman.
Added the ability write unlimited netCDF dimensions with the scipy and netcdf4 backends via the new xray.Dataset.encoding attribute or via the unlimited_dims argument to xray.Dataset.to_netcdf. By Joe Hamman.
New quantile() method to calculate quantiles from DataArray objects (GH1187). By Joe Hamman.

Bug fixes#

groupby_bins now restores empty bins by default (GH1019). By Ryan Abernathey.
Fix issues for dates outside the valid range of pandas timestamps (GH975). By Mathias Hauser.
Unstacking produced flipped array after stacking decreasing coordinate values (GH980). By Stephan Hoyer.
Setting dtype via the encoding parameter of to_netcdf failed if the encoded dtype was the same as the dtype of the original array (GH873). By Stephan Hoyer.
Fix issues with variables where both attributes _FillValue and missing_value are set to NaN (GH997). By Marco Zühlke.
.where() and .fillna() now preserve attributes (GH1009). By Fabien Maussion.
Applying broadcast() to an xarray object based on the dask backend won’t accidentally convert the array from dask to numpy anymore (GH978). By Guido Imperiale.
Dataset.concat() now preserves variables order (GH1027). By Fabien Maussion.
Fixed an issue with pcolormesh (GH781). A new infer_intervals keyword gives control on whether the cell intervals should be computed or not. By Fabien Maussion.
Grouping over an dimension with non-unique values with groupby gives correct groups. By Stephan Hoyer.
Fixed accessing coordinate variables with non-string names from .coords. By Stephan Hoyer.
rename() now simultaneously renames the array and any coordinate with the same name, when supplied via a dict (GH1116). By Yves Delley.
Fixed sub-optimal performance in certain operations with object arrays (GH1121). By Yves Delley.
Fix .groupby(group) when group has datetime dtype (GH1132). By Jonas Sølvsteen.
Fixed a bug with facetgrid (the norm keyword was ignored, GH1159). By Fabien Maussion.
Resolved a concurrency bug that could cause Python to crash when simultaneously reading and writing netCDF4 files with dask (GH1172). By Stephan Hoyer.
Fix to make .copy() actually copy dask arrays, which will be relevant for future releases of dask in which dask arrays will be mutable (GH1180). By Stephan Hoyer.
Fix opening NetCDF files with multi-dimensional time variables (GH1229). By Stephan Hoyer.

Performance improvements#

xarray.Dataset.isel_points and xarray.Dataset.sel_points now use vectorised indexing in numpy and dask (GH1161), which can result in several orders of magnitude speedup. By Jonathan Chambers.

v0.8.2 (18 August 2016)#

This release includes a number of bug fixes and minor enhancements.

Breaking changes#

broadcast() and concat() now auto-align inputs, using join=outer. Previously, these functions raised ValueError for non-aligned inputs. By Guido Imperiale.

Enhancements#

New documentation on Transitioning from pandas.Panel to xarray. By Maximilian Roos.
New Dataset and DataArray methods to_dict() and from_dict() to allow easy conversion between dictionaries and xarray objects (GH432). See dictionary IO for more details. By Julia Signell.
Added exclude and indexes optional parameters to align(), and exclude optional parameter to broadcast(). By Guido Imperiale.
Better error message when assigning variables without dimensions (GH971). By Stephan Hoyer.
Better error message when reindex/align fails due to duplicate index values (GH956). By Stephan Hoyer.

Bug fixes#

Ensure xarray works with h5netcdf v0.3.0 for arrays with dtype=str (GH953). By Stephan Hoyer.
Dataset.__dir__() (i.e. the method python calls to get autocomplete options) failed if one of the dataset’s keys was not a string (GH852). By Maximilian Roos.
Dataset constructor can now take arbitrary objects as values (GH647). By Maximilian Roos.
Clarified copy argument for reindex() and align(), which now consistently always return new xarray objects (GH927).
Fix open_mfdataset with engine='pynio' (GH936). By Stephan Hoyer.
groupby_bins sorted bin labels as strings (GH952). By Stephan Hoyer.
Fix bug introduced by v0.8.0 that broke assignment to datasets when both the left and right side have the same non-unique index values (GH956).

v0.8.1 (5 August 2016)#

Bug fixes#

Fix bug in v0.8.0 that broke assignment to Datasets with non-unique indexes (GH943). By Stephan Hoyer.

v0.8.0 (2 August 2016)#

This release includes four months of new features and bug fixes, including several breaking changes.

Breaking changes#

Dropped support for Python 2.6 (GH855).
Indexing on multi-index now drop levels, which is consistent with pandas. It also changes the name of the dimension / coordinate when the multi-index is reduced to a single index (GH802).
Contour plots no longer add a colorbar per default (GH866). Filled contour plots are unchanged.
DataArray.values and .data now always returns an NumPy array-like object, even for 0-dimensional arrays with object dtype (GH867). Previously, .values returned native Python objects in such cases. To convert the values of scalar arrays to Python objects, use the .item() method.

Enhancements#

Groupby operations now support grouping over multidimensional variables. A new method called groupby_bins() has also been added to allow users to specify bins for grouping. The new features are described in Multidimensional Grouping and Working with Multidimensional Coordinates. By Ryan Abernathey.
DataArray and Dataset method where() now supports a drop=True option that clips coordinate elements that are fully masked. By Phillip J. Wolfram.
New top level merge() function allows for combining variables from any number of Dataset and/or DataArray variables. See Merge for more details. By Stephan Hoyer.
DataArray.resample() and Dataset.resample() now support the keep_attrs=False option that determines whether variable and dataset attributes are retained in the resampled object. By Jeremy McGibbon.
Better multi-index support in DataArray.sel(), DataArray.loc(), Dataset.sel() and Dataset.loc(), which now behave more closely to pandas and which also accept dictionaries for indexing based on given level names and labels (see Multi-level indexing). By Benoit Bovy.
New (experimental) decorators register_dataset_accessor() and register_dataarray_accessor() for registering custom xarray extensions without subclassing. They are described in the new documentation page on Xarray Internals. By Stephan Hoyer.
Round trip boolean datatypes. Previously, writing boolean datatypes to netCDF formats would raise an error since netCDF does not have a bool datatype. This feature reads/writes a dtype attribute to boolean variables in netCDF files. By Joe Hamman.
2D plotting methods now have two new keywords (cbar_ax and cbar_kwargs), allowing more control on the colorbar (GH872). By Fabien Maussion.
New Dataset method Dataset.filter_by_attrs(), akin to netCDF4.Dataset.get_variables_by_attributes, to easily filter data variables using its attributes. Filipe Fernandes.

Bug fixes#

Attributes were being retained by default for some resampling operations when they should not. With the keep_attrs=False option, they will no longer be retained by default. This may be backwards-incompatible with some scripts, but the attributes may be kept by adding the keep_attrs=True option. By Jeremy McGibbon.
Concatenating xarray objects along an axis with a MultiIndex or PeriodIndex preserves the nature of the index (GH875). By Stephan Hoyer.
Fixed bug in arithmetic operations on DataArray objects whose dimensions are numpy structured arrays or recarrays GH861, GH837. By Maciek Swat.
decode_cf_timedelta now accepts arrays with ndim >1 (GH842).
This fixes issue GH665. Filipe Fernandes.
Fix a bug where xarray.ufuncs that take two arguments would incorrectly use to numpy functions instead of dask.array functions (GH876). By Stephan Hoyer.
Support for pickling functions from xarray.ufuncs (GH901). By Stephan Hoyer.
Variable.copy(deep=True) no longer converts MultiIndex into a base Index (GH769). By Benoit Bovy.
Fixes for groupby on dimensions with a multi-index (GH867). By Stephan Hoyer.
Fix printing datasets with unicode attributes on Python 2 (GH892). By Stephan Hoyer.
Fixed incorrect test for dask version (GH891). By Stephan Hoyer.
Fixed dim argument for isel_points/sel_points when a pandas.Index is passed. By Stephan Hoyer.
contour() now plots the correct number of contours (GH866). By Fabien Maussion.

v0.7.2 (13 March 2016)#

This release includes two new, entirely backwards compatible features and several bug fixes.

Enhancements#

New DataArray method DataArray.dot() for calculating the dot product of two DataArrays along shared dimensions. By Dean Pospisil.

Rolling window operations on DataArray objects are now supported via a new DataArray.rolling() method. For example:

import xarray as xr
import numpy as np

arr = xr.DataArray(np.arange(0, 7.5, 0.5).reshape(3, 5), dims=("x", "y"))
arr

<xarray.DataArray (x: 3, y: 5)>
array([[ 0. ,  0.5,  1. ,  1.5,  2. ],
       [ 2.5,  3. ,  3.5,  4. ,  4.5],
       [ 5. ,  5.5,  6. ,  6.5,  7. ]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2 3 4

arr.rolling(y=3, min_periods=2).mean()

<xarray.DataArray (x: 3, y: 5)>
array([[  nan,  0.25,  0.5 ,  1.  ,  1.5 ],
       [  nan,  2.75,  3.  ,  3.5 ,  4.  ],
       [  nan,  5.25,  5.5 ,  6.  ,  6.5 ]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2 3 4

See Rolling window operations for more details. By Joe Hamman.

Bug fixes#

Fixed an issue where plots using pcolormesh and Cartopy axes were being distorted by the inference of the axis interval breaks. This change chooses not to modify the coordinate variables when the axes have the attribute projection, allowing Cartopy to handle the extent of pcolormesh plots (GH781). By Joe Hamman.
2D plots now better handle additional coordinates which are not DataArray dimensions (GH788). By Fabien Maussion.

v0.7.1 (16 February 2016)#

This is a bug fix release that includes two small, backwards compatible enhancements. We recommend that all users upgrade.

Enhancements#

Numerical operations now return empty objects on no overlapping labels rather than raising ValueError (GH739).
Series is now supported as valid input to the Dataset constructor (GH740).

Bug fixes#

Restore checks for shape consistency between data and coordinates in the DataArray constructor (GH758).
Single dimension variables no longer transpose as part of a broader .transpose. This behavior was causing pandas.PeriodIndex dimensions to lose their type (GH749)
Dataset labels remain as their native type on .to_dataset. Previously they were coerced to strings (GH745)
Fixed a bug where replacing a DataArray index coordinate would improperly align the coordinate (GH725).
DataArray.reindex_like now maintains the dtype of complex numbers when reindexing leads to NaN values (GH738).
Dataset.rename and DataArray.rename support the old and new names being the same (GH724).
Fix from_dataframe() for DataFrames with Categorical column and a MultiIndex index (GH737).
Fixes to ensure xarray works properly after the upcoming pandas v0.18 and NumPy v1.11 releases.

Acknowledgments#

The following individuals contributed to this release:

Edward Richards
Maximilian Roos
Rafael Guedes
Spencer Hill
Stephan Hoyer

v0.7.0 (21 January 2016)#

This major release includes redesign of DataArray internals, as well as new methods for reshaping, rolling and shifting data. It includes preliminary support for pandas.MultiIndex, as well as a number of other features and bug fixes, several of which offer improved compatibility with pandas.

New name#

The project formerly known as “xray” is now “xarray”, pronounced “x-array”! This avoids a namespace conflict with the entire field of x-ray science. Renaming our project seemed like the right thing to do, especially because some scientists who work with actual x-rays are interested in using this project in their work. Thanks for your understanding and patience in this transition. You can now find our documentation and code repository at new URLs:

To ease the transition, we have simultaneously released v0.7.0 of both xray and xarray on the Python Package Index. These packages are identical. For now, import xray still works, except it issues a deprecation warning. This will be the last xray release. Going forward, we recommend switching your import statements to import xarray as xr.

Breaking changes#

The internal data model used by xray.DataArray has been rewritten to fix several outstanding issues (GH367, GH634, this stackoverflow report). Internally, DataArray is now implemented in terms of ._variable and ._coords attributes instead of holding variables in a Dataset object.

This refactor ensures that if a DataArray has the same name as one of its coordinates, the array and the coordinate no longer share the same data.

In practice, this means that creating a DataArray with the same name as one of its dimensions no longer automatically uses that array to label the corresponding coordinate. You will now need to provide coordinate labels explicitly. Here’s the old behavior:
xray.DataArray([4, 5, 6], dims="x", name="x")
<xray.DataArray 'x' (x: 3)> array([4, 5, 6]) Coordinates: * x (x) int64 4 5 6
and the new behavior (compare the values of the x coordinate):
xray.DataArray([4, 5, 6], dims="x", name="x")
<xray.DataArray 'x' (x: 3)> array([4, 5, 6]) Coordinates: * x (x) int64 0 1 2
It is no longer possible to convert a DataArray to a Dataset with xray.DataArray.to_dataset if it is unnamed. This will now raise ValueError. If the array is unnamed, you need to supply the name argument.

Enhancements#

Basic support for MultiIndex coordinates on xray objects, including indexing, stack() and unstack():

df = pd.DataFrame({"foo": range(3), "x": ["a", "b", "b"], "y": [0, 0, 1]})

s = df.set_index(["x", "y"])["foo"]

arr = xray.DataArray(s, dims="z")

arr

<xray.DataArray 'foo' (z: 3)>
array([0, 1, 2])
Coordinates:
  * z        (z) object ('a', 0) ('b', 0) ('b', 1)

arr.indexes["z"]

MultiIndex(levels=[[u'a', u'b'], [0, 1]],
           labels=[[0, 1, 1], [0, 0, 1]],
           names=[u'x', u'y'])

arr.unstack("z")

<xray.DataArray 'foo' (x: 2, y: 2)>
array([[  0.,  nan],
       [  1.,   2.]])
Coordinates:
  * x        (x) object 'a' 'b'
  * y        (y) int64 0 1

arr.unstack("z").stack(z=("x", "y"))

<xray.DataArray 'foo' (z: 4)>
array([  0.,  nan,   1.,   2.])
Coordinates:
  * z        (z) object ('a', 0) ('a', 1) ('b', 0) ('b', 1)

See Stack and unstack for more details.

Warning

xray’s MultiIndex support is still experimental, and we have a long to- do list of desired additions (GH719), including better display of multi-index levels when printing a Dataset, and support for saving datasets with a MultiIndex to a netCDF file. User contributions in this area would be greatly appreciated.

Support for reading GRIB, HDF4 and other file formats via PyNIO.
Better error message when a variable is supplied with the same name as one of its dimensions.
Plotting: more control on colormap parameters (GH642). vmin and vmax will not be silently ignored anymore. Setting center=False prevents automatic selection of a divergent colormap.
New xray.Dataset.shift and xray.Dataset.roll methods for shifting/rotating datasets or arrays along a dimension:
```
array = xray.DataArray([5, 6, 7, 8], dims="x")
array.shift(x=2)
array.roll(x=2)
```
Notice that shift moves data independently of coordinates, but roll moves both data and coordinates.
Assigning a pandas object directly as a Dataset variable is now permitted. Its index names correspond to the dims of the Dataset, and its data is aligned.
Passing a pandas.DataFrame or pandas.Panel to a Dataset constructor is now permitted.

New function xray.broadcast for explicitly broadcasting DataArray and Dataset objects against each other. For example:

a = xray.DataArray([1, 2, 3], dims="x")
b = xray.DataArray([5, 6], dims="y")
a
b
a2, b2 = xray.broadcast(a, b)
a2
b2

Bug fixes#

Fixes for several issues found on DataArray objects with the same name as one of their coordinates (see Breaking changes for more details).
DataArray.to_masked_array always returns masked array with mask being an array (not a scalar value) (GH684)
Allows for (imperfect) repr of Coords when underlying index is PeriodIndex (GH645).
Fixes for several issues found on DataArray objects with the same name as one of their coordinates (see Breaking changes for more details).
Attempting to assign a Dataset or DataArray variable/attribute using attribute-style syntax (e.g., ds.foo = 42) now raises an error rather than silently failing (GH656, GH714).
You can now pass pandas objects with non-numpy dtypes (e.g., categorical or datetime64 with a timezone) into xray without an error (GH716).

Acknowledgments#

The following individuals contributed to this release:

Antony Lee
Fabien Maussion
Joe Hamman
Maximilian Roos
Stephan Hoyer
Takeshi Kanmae
femtotrader

v0.6.1 (21 October 2015)#

This release contains a number of bug and compatibility fixes, as well as enhancements to plotting, indexing and writing files to disk.

Note that the minimum required version of dask for use with xray is now version 0.6.

API Changes#

The handling of colormaps and discrete color lists for 2D plots in xray.DataArray.plot was changed to provide more compatibility with matplotlib’s contour and contourf functions (GH538). Now discrete lists of colors should be specified using colors keyword, rather than cmap.

Enhancements#

Faceted plotting through xray.plot.FacetGrid and the xray.plot.plot method. See Faceting for more details and examples.

xray.Dataset.sel and xray.Dataset.reindex now support the tolerance argument for controlling nearest-neighbor selection (GH629):

array = xray.DataArray([1, 2, 3], dims="x")

array.reindex(x=[0.9, 1.5], method="nearest", tolerance=0.2)

<xray.DataArray (x: 2)>
array([  2.,  nan])
Coordinates:
  * x        (x) float64 0.9 1.5

This feature requires pandas v0.17 or newer.

New encoding argument in xray.Dataset.to_netcdf for writing netCDF files with compression, as described in the new documentation section on Writing encoded data.
Add xray.Dataset.real and xray.Dataset.imag attributes to Dataset and DataArray (GH553).
More informative error message with xray.Dataset.from_dataframe if the frame has duplicate columns.
xray now uses deterministic names for dask arrays it creates or opens from disk. This allows xray users to take advantage of dask’s nascent support for caching intermediate computation results. See GH555 for an example.

Bug fixes#

Forwards compatibility with the latest pandas release (v0.17.0). We were using some internal pandas routines for datetime conversion, which unfortunately have now changed upstream (GH569).
Aggregation functions now correctly skip NaN for data for complex128 dtype (GH554).
Fixed indexing 0d arrays with unicode dtype (GH568).
xray.DataArray.name and Dataset keys must be a string or None to be written to netCDF (GH533).
xray.DataArray.where now uses dask instead of numpy if either the array or other is a dask array. Previously, if other was a numpy array the method was evaluated eagerly.
Global attributes are now handled more consistently when loading remote datasets using engine='pydap' (GH574).
It is now possible to assign to the .data attribute of DataArray objects.
coordinates attribute is now kept in the encoding dictionary after decoding (GH610).
Compatibility with numpy 1.10 (GH617).

Acknowledgments#

The following individuals contributed to this release:

Ryan Abernathey
Pete Cable
Clark Fitzgerald
Joe Hamman
Stephan Hoyer
Scott Sinclair

v0.6.0 (21 August 2015)#

This release includes numerous bug fixes and enhancements. Highlights include the introduction of a plotting module and the new Dataset and DataArray methods xray.Dataset.isel_points, xray.Dataset.sel_points, xray.Dataset.where and xray.Dataset.diff. There are no breaking changes from v0.5.2.

Enhancements#

Plotting methods have been implemented on DataArray objects xray.DataArray.plot through integration with matplotlib (GH185). For an introduction, see Plotting.
Variables in netCDF files with multiple missing values are now decoded as NaN after issuing a warning if open_dataset is called with mask_and_scale=True.
We clarified our rules for when the result from an xray operation is a copy vs. a view (see Copies vs. Views for more details).
Dataset variables are now written to netCDF files in order of appearance when using the netcdf4 backend (GH479).

Added xray.Dataset.isel_points and xray.Dataset.sel_points to support pointwise indexing of Datasets and DataArrays (GH475).

da = xray.DataArray(
   ...:     np.arange(56).reshape((7, 8)),
   ...:     coords={"x": list("abcdefg"), "y": 10 * np.arange(8)},
   ...:     dims=["x", "y"],
   ...: )

da

<xray.DataArray (x: 7, y: 8)>
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55]])
Coordinates:
* y        (y) int64 0 10 20 30 40 50 60 70
* x        (x) |S1 'a' 'b' 'c' 'd' 'e' 'f' 'g'

# we can index by position along each dimension
da.isel_points(x=[0, 1, 6], y=[0, 1, 0], dim="points")

<xray.DataArray (points: 3)>
array([ 0,  9, 48])
Coordinates:
    y        (points) int64 0 10 0
    x        (points) |S1 'a' 'b' 'g'
  * points   (points) int64 0 1 2

# or equivalently by label
da.sel_points(x=["a", "b", "g"], y=[0, 10, 0], dim="points")

<xray.DataArray (points: 3)>
array([ 0,  9, 48])
Coordinates:
    y        (points) int64 0 10 0
    x        (points) |S1 'a' 'b' 'g'
  * points   (points) int64 0 1 2

New xray.Dataset.where method for masking xray objects according to some criteria. This works particularly well with multi-dimensional data:

ds = xray.Dataset(coords={"x": range(100), "y": range(100)})
ds["distance"] = np.sqrt(ds.x**2 + ds.y**2)
ds.distance.where(ds.distance < 100).plot()

Added new methods xray.DataArray.diff and xray.Dataset.diff for finite difference calculations along a given axis.

New xray.DataArray.to_masked_array convenience method for returning a numpy.ma.MaskedArray.

da = xray.DataArray(np.random.random_sample(size=(5, 4)))
da.where(da < 0.5)
da.where(da < 0.5).to_masked_array(copy=True)

Added new flag “drop_variables” to xray.open_dataset for excluding variables from being parsed. This may be useful to drop variables with problems or inconsistent values.

Bug fixes#

Fixed aggregation functions (e.g., sum and mean) on big-endian arrays when bottleneck is installed (GH489).
Dataset aggregation functions dropped variables with unsigned integer dtype (GH505).
.any() and .all() were not lazy when used on xray objects containing dask arrays.
Fixed an error when attempting to saving datetime64 variables to netCDF files when the first element is NaT (GH528).
Fix pickle on DataArray objects (GH515).
Fixed unnecessary coercion of float64 to float32 when using netcdf3 and netcdf4_classic formats (GH526).

v0.5.2 (16 July 2015)#

This release contains bug fixes, several additional options for opening and saving netCDF files, and a backwards incompatible rewrite of the advanced options for xray.concat.

Backwards incompatible changes#

The optional arguments concat_over and mode in xray.concat have been removed and replaced by data_vars and coords. The new arguments are both more easily understood and more robustly implemented, and allowed us to fix a bug where concat accidentally loaded data into memory. If you set values for these optional arguments manually, you will need to update your code. The default behavior should be unchanged.

Enhancements#

xray.open_mfdataset now supports a preprocess argument for preprocessing datasets prior to concatenaton. This is useful if datasets cannot be otherwise merged automatically, e.g., if the original datasets have conflicting index coordinates (GH443).
xray.open_dataset and xray.open_mfdataset now use a global thread lock by default for reading from netCDF files with dask. This avoids possible segmentation faults for reading from netCDF4 files when HDF5 is not configured properly for concurrent access (GH444).
Added support for serializing arrays of complex numbers with engine='h5netcdf'.
The new xray.save_mfdataset function allows for saving multiple datasets to disk simultaneously. This is useful when processing large datasets with dask.array. For example, to save a dataset too big to fit into memory to one file per year, we could write:
years, datasets = zip(*ds.groupby("time.year")) paths = ["%s.nc" % y for y in years] xray.save_mfdataset(datasets, paths)

Bug fixes#

Fixed min, max, argmin and argmax for arrays with string or unicode types (GH453).
xray.open_dataset and xray.open_mfdataset support supplying chunks as a single integer.
Fixed a bug in serializing scalar datetime variable to netCDF.
Fixed a bug that could occur in serialization of 0-dimensional integer arrays.
Fixed a bug where concatenating DataArrays was not always lazy (GH464).
When reading datasets with h5netcdf, bytes attributes are decoded to strings. This allows conventions decoding to work properly on Python 3 (GH451).

v0.5.1 (15 June 2015)#

This minor release fixes a few bugs and an inconsistency with pandas. It also adds the pipe method, copied from pandas.

Enhancements#

Added xray.Dataset.pipe, replicating the new pandas method in version 0.16.2. See Transforming datasets for more details.
xray.Dataset.assign and xray.Dataset.assign_coords now assign new variables in sorted (alphabetical) order, mirroring the behavior in pandas. Previously, the order was arbitrary.

Bug fixes#

xray.concat fails in an edge case involving identical coordinate variables (GH425)
We now decode variables loaded from netCDF3 files with the scipy engine using native endianness (GH416). This resolves an issue when aggregating these arrays with bottleneck installed.

v0.5 (1 June 2015)#

Highlights#

The headline feature in this release is experimental support for out-of-core computing (data that doesn’t fit into memory) with Parallel Computing with Dask. This includes a new top-level function xray.open_mfdataset that makes it easy to open a collection of netCDF (using dask) as a single xray.Dataset object. For more on dask, read the blog post introducing xray + dask and the new documentation section Parallel Computing with Dask.

Dask makes it possible to harness parallelism and manipulate gigantic datasets with xray. It is currently an optional dependency, but it may become required in the future.

Backwards incompatible changes#

The logic used for choosing which variables are concatenated with xray.concat has changed. Previously, by default any variables which were equal across a dimension were not concatenated. This lead to some surprising behavior, where the behavior of groupby and concat operations could depend on runtime values (GH268). For example:

ds = xray.Dataset({"x": 0})

xray.concat([ds, ds], dim="y")

<xray.Dataset>
Dimensions:  ()
Coordinates:
    *empty*
Data variables:
    x        int64 0

Now, the default always concatenates data variables:

In [1]: ds = xray.Dataset({"x": 0})

In [2]: xray.concat([ds, ds], dim="y")
Out[2]:
<xarray.Dataset> Size: 16B
Dimensions:  (y: 2)
Dimensions without coordinates: y
Data variables:
    x        (y) int64 16B 0 0

xray.concat([ds, ds], dim="y")

To obtain the old behavior, supply the argument concat_over=[].

Enhancements#

New xray.Dataset.to_dataarray and enhanced xray.DataArray.to_dataset methods make it easy to switch back and forth between arrays and datasets:

ds = xray.Dataset(
    {"a": 1, "b": ("x", [1, 2, 3])},
    coords={"c": 42},
    attrs={"Conventions": "None"},
)
ds.to_dataarray()
ds.to_dataarray().to_dataset(dim="variable")

New xray.Dataset.fillna method to fill missing values, modeled off the pandas method of the same name:
```
array = xray.DataArray([np.nan, 1, np.nan, 3], dims="x")
array.fillna(0)
```
fillna works on both Dataset and DataArray objects, and uses index based alignment and broadcasting like standard binary operations. It also can be applied by group, as illustrated in Fill missing values with climatology.
New xray.Dataset.assign and xray.Dataset.assign_coords methods patterned off the new DataFrame.assign method in pandas:
```
ds = xray.Dataset({"y": ("x", [1, 2, 3])})
ds.assign(z=lambda ds: ds.y**2)
ds.assign_coords(z=("x", ["a", "b", "c"]))
```
These methods return a new Dataset (or DataArray) with updated data or coordinate variables.

xray.Dataset.sel now supports the method parameter, which works like the parameter of the same name on xray.Dataset.reindex. It provides a simple interface for doing nearest-neighbor interpolation:

ds.sel(x=1.1, method="nearest")

<xray.Dataset>
Dimensions:  ()
Coordinates:
    x        int64 1
Data variables:
    y        int64 2

ds.sel(x=[1.1, 2.1], method="pad")

<xray.Dataset>
Dimensions:  (x: 2)
Coordinates:
  * x        (x) int64 1 2
Data variables:
    y        (x) int64 2 3

See Nearest neighbor lookups for more details.

You can now control the underlying backend used for accessing remote datasets (via OPeNDAP) by specifying engine='netcdf4' or engine='pydap'.
xray now provides experimental support for reading and writing netCDF4 files directly via h5py with the h5netcdf package, avoiding the netCDF4-Python package. You will need to install h5netcdf and specify engine='h5netcdf' to try this feature.
Accessing data from remote datasets now has retrying logic (with exponential backoff) that should make it robust to occasional bad responses from DAP servers.
You can control the width of the Dataset repr with xray.set_options. It can be used either as a context manager, in which case the default is restored outside the context:
```
ds = xray.Dataset({"x": np.arange(1000)})
with xray.set_options(display_width=40):
    print(ds)
```
Or to set a global option:
xray.set_options(display_width=80)
The default value for the display_width option is 80.

Deprecations#

The method load_data() has been renamed to the more succinct xray.Dataset.load.

v0.4.1 (18 March 2015)#

The release contains bug fixes and several new features. All changes should be fully backwards compatible.

Enhancements#

New documentation sections on Time series data and Reading multi-file datasets.
xray.Dataset.resample lets you resample a dataset or data array to a new temporal resolution. The syntax is the same as pandas, except you need to supply the time dimension explicitly:
```
time = pd.date_range("2000-01-01", freq="6H", periods=10)
array = xray.DataArray(np.arange(10), [("time", time)])
array.resample("1D", dim="time")
```
You can specify how to do the resampling with the how argument and other options such as closed and label let you control labeling:
```
array.resample("1D", dim="time", how="sum", label="right")
```
If the desired temporal resolution is higher than the original data (upsampling), xray will insert missing values:
```
array.resample("3H", "time")
```
first and last methods on groupby objects let you take the first or last examples from each group along the grouped axis:
```
array.groupby("time.day").first()
```
These methods combine well with resample:
```
array.resample("1D", dim="time", how="first")
```
xray.Dataset.swap_dims allows for easily swapping one dimension out for another:
```
ds = xray.Dataset({"x": range(3), "y": ("x", list("abc"))})
ds.swap_dims({"x": "y"})
```
This was possible in earlier versions of xray, but required some contortions.
xray.open_dataset and xray.Dataset.to_netcdf now accept an engine argument to explicitly select which underlying library (netcdf4 or scipy) is used for reading/writing a netCDF file.

Bug fixes#

Fixed a bug where data netCDF variables read from disk with engine='scipy' could still be associated with the file on disk, even after closing the file (GH341). This manifested itself in warnings about mmapped arrays and segmentation faults (if the data was accessed).
Silenced spurious warnings about all-NaN slices when using nan-aware aggregation methods (GH344).
Dataset aggregations with keep_attrs=True now preserve attributes on data variables, not just the dataset itself.
Tests for xray now pass when run on Windows (GH360).
Fixed a regression in v0.4 where saving to netCDF could fail with the error ValueError: could not automatically determine time units.

v0.4 (2 March, 2015)#

This is one of the biggest releases yet for xray: it includes some major changes that may break existing code, along with the usual collection of minor enhancements and bug fixes. On the plus side, this release includes all hitherto planned breaking changes, so the upgrade path for xray should be smoother going forward.

Breaking changes#

We now automatically align index labels in arithmetic, dataset construction, merging and updating. This means the need for manually invoking methods like xray.align and xray.Dataset.reindex_like should be vastly reduced.

For arithmetic, we align based on the intersection of labels:
```
lhs = xray.DataArray([1, 2, 3], [("x", [0, 1, 2])])
rhs = xray.DataArray([2, 3, 4], [("x", [1, 2, 3])])
lhs + rhs
```
For dataset construction and merging, we align based on the union of labels:
```
xray.Dataset({"foo": lhs, "bar": rhs})
```
For update and __setitem__, we align based on the original object:
```
lhs.coords["rhs"] = rhs
lhs
```
Aggregations like mean or median now skip missing values by default:
```
xray.DataArray([1, 2, np.nan, 3]).mean()
```
You can turn this behavior off by supplying the keyword argument skipna=False.

These operations are lightning fast thanks to integration with bottleneck, which is a new optional dependency for xray (numpy is used if bottleneck is not installed).
Scalar coordinates no longer conflict with constant arrays with the same value (e.g., in arithmetic, merging datasets and concat), even if they have different shape (GH243). For example, the coordinate c here persists through arithmetic, even though it has different shapes on each DataArray:
```
a = xray.DataArray([1, 2], coords={"c": 0}, dims="x")
b = xray.DataArray([1, 2], coords={"c": ("x", [0, 0])}, dims="x")
(a + b).coords
```
This functionality can be controlled through the compat option, which has also been added to the xray.Dataset constructor.
Datetime shortcuts such as 'time.month' now return a DataArray with the name 'month', not 'time.month' (GH345). This makes it easier to index the resulting arrays when they are used with groupby:
```
time = xray.DataArray(
    pd.date_range("2000-01-01", periods=365), dims="time", name="time"
)
counts = time.groupby("time.month").count()
counts.sel(month=2)
```
Previously, you would need to use something like counts.sel(**{'time.month': 2}}), which is much more awkward.

The season datetime shortcut now returns an array of string labels such 'DJF':

In[92]: ds = xray.Dataset({"t": pd.date_range("2000-01-01", periods=12, freq="M")})

In[93]: ds["t.season"]
Out[93]:
<xarray.DataArray 'season' (t: 12)>
array(['DJF', 'DJF', 'MAM', ..., 'SON', 'SON', 'DJF'], dtype='<U3')
Coordinates:
  * t        (t) datetime64[ns] 2000-01-31 2000-02-29 ... 2000-11-30 2000-12-31

Previously, it returned numbered seasons 1 through 4.

We have updated our use of the terms of “coordinates” and “variables”. What were known in previous versions of xray as “coordinates” and “variables” are now referred to throughout the documentation as “coordinate variables” and “data variables”. This brings xray in closer alignment to CF Conventions. The only visible change besides the documentation is that Dataset.vars has been renamed Dataset.data_vars.
You will need to update your code if you have been ignoring deprecation warnings: methods and attributes that were deprecated in xray v0.3 or earlier (e.g., dimensions, attributes`) have gone away.

Enhancements#

Support for xray.Dataset.reindex with a fill method. This provides a useful shortcut for upsampling:
```
data = xray.DataArray([1, 2, 3], [("x", range(3))])
data.reindex(x=[0.5, 1, 1.5, 2, 2.5], method="pad")
```
This will be especially useful once pandas 0.16 is released, at which point xray will immediately support reindexing with method=’nearest’.
Use functions that return generic ndarrays with DataArray.groupby.apply and Dataset.apply (GH327 and GH329). Thanks Jeff Gerard!
Consolidated the functionality of dumps (writing a dataset to a netCDF3 bytestring) into xray.Dataset.to_netcdf (GH333).
xray.Dataset.to_netcdf now supports writing to groups in netCDF4 files (GH333). It also finally has a full docstring – you should read it!
xray.open_dataset and xray.Dataset.to_netcdf now work on netCDF3 files when netcdf4-python is not installed as long as scipy is available (GH333).

The new xray.Dataset.drop and xray.DataArray.drop methods makes it easy to drop explicitly listed variables or index labels:

# drop variables
ds = xray.Dataset({"x": 0, "y": 1})
ds.drop("x")

# drop index labels
arr = xray.DataArray([1, 2, 3], coords=[("x", list("abc"))])
arr.drop(["a", "c"], dim="x")

xray.Dataset.broadcast_equals has been added to correspond to the new compat option.
Long attributes are now truncated at 500 characters when printing a dataset (GH338). This should make things more convenient for working with datasets interactively.
Added a new documentation example, Calculating Seasonal Averages from Time Series of Monthly Means. Thanks Joe Hamman!

Bug fixes#

Several bug fixes related to decoding time units from netCDF files (GH316, GH330). Thanks Stefan Pfenninger!
xray no longer requires decode_coords=False when reading datasets with unparsable coordinate attributes (GH308).
Fixed DataArray.loc indexing with ... (GH318).
Fixed an edge case that resulting in an error when reindexing multi-dimensional variables (GH315).
Slicing with negative step sizes (GH312).
Invalid conversion of string arrays to numeric dtype (GH305).
Fixed repr() on dataset objects with non-standard dates (GH347).

Deprecations#

dump and dumps have been deprecated in favor of xray.Dataset.to_netcdf.
drop_vars has been deprecated in favor of xray.Dataset.drop.

Future plans#

The biggest feature I’m excited about working toward in the immediate future is supporting out-of-core operations in xray using Dask, a part of the Blaze project. For a preview of using Dask with weather data, read this blog post by Matthew Rocklin. See GH328 for more details.

v0.3.2 (23 December, 2014)#

This release focused on bug-fixes, speedups and resolving some niggling inconsistencies.

There are a few cases where the behavior of xray differs from the previous version. However, I expect that in almost all cases your code will continue to run unmodified.

Warning

xray now requires pandas v0.15.0 or later. This was necessary for supporting TimedeltaIndex without too many painful hacks.

Backwards incompatible changes#

Arrays of datetime.datetime objects are now automatically cast to datetime64[ns] arrays when stored in an xray object, using machinery borrowed from pandas:
```
from datetime import datetime

xray.Dataset({"t": [datetime(2000, 1, 1)]})
```
xray now has support (including serialization to netCDF) for TimedeltaIndex. datetime.timedelta objects are thus accordingly cast to timedelta64[ns] objects when appropriate.
Masked arrays are now properly coerced to use NaN as a sentinel value (GH259).

Enhancements#

Due to popular demand, we have added experimental attribute style access as a shortcut for dataset variables, coordinates and attributes:
```
ds = xray.Dataset({"tmin": ([], 25, {"units": "celsius"})})
ds.tmin.units
```
Tab-completion for these variables should work in editors such as IPython. However, setting variables or attributes in this fashion is not yet supported because there are some unresolved ambiguities (GH300).
You can now use a dictionary for indexing with labeled dimensions. This provides a safe way to do assignment with labeled dimensions:
```
array = xray.DataArray(np.zeros(5), dims=["x"])
array[dict(x=slice(3))] = 1
array
```
Non-index coordinates can now be faithfully written to and restored from netCDF files. This is done according to CF conventions when possible by using the coordinates attribute on a data variable. When not possible, xray defines a global coordinates attribute.
Preliminary support for converting xray.DataArray objects to and from CDAT cdms2 variables.
We sped up any operation that involves creating a new Dataset or DataArray (e.g., indexing, aggregation, arithmetic) by a factor of 30 to 50%. The full speed up requires cyordereddict to be installed.

Bug fixes#

Fix for to_dataframe() with 0d string/object coordinates (GH287)
Fix for to_netcdf with 0d string variable (GH284)
Fix writing datetime64 arrays to netcdf if NaT is present (GH270)
Fix align silently upcasts data arrays when NaNs are inserted (GH264)

Future plans#

I am contemplating switching to the terms “coordinate variables” and “data variables” instead of the (currently used) “coordinates” and “variables”, following their use in CF Conventions (GH293). This would mostly have implications for the documentation, but I would also change the Dataset attribute vars to data.
I no longer certain that automatic label alignment for arithmetic would be a good idea for xray – it is a feature from pandas that I have not missed (GH186).
The main API breakage that I do anticipate in the next release is finally making all aggregation operations skip missing values by default (GH130). I’m pretty sick of writing ds.reduce(np.nanmean, 'time').
The next version of xray (0.4) will remove deprecated features and aliases whose use currently raises a warning.

If you have opinions about any of these anticipated changes, I would love to hear them – please add a note to any of the referenced GitHub issues.

v0.3.1 (22 October, 2014)#

This is mostly a bug-fix release to make xray compatible with the latest release of pandas (v0.15).

We added several features to better support working with missing values and exporting xray objects to pandas. We also reorganized the internal API for serializing and deserializing datasets, but this change should be almost entirely transparent to users.

Other than breaking the experimental DataStore API, there should be no backwards incompatible changes.

New features#

Added xray.Dataset.count and xray.Dataset.dropna methods, copied from pandas, for working with missing values (GH247, GH58).
Added xray.DataArray.to_pandas for converting a data array into the pandas object with the same dimensionality (1D to Series, 2D to DataFrame, etc.) (GH255).
Support for reading gzipped netCDF3 files (GH239).
Reduced memory usage when writing netCDF files (GH251).
‘missing_value’ is now supported as an alias for the ‘_FillValue’ attribute on netCDF variables (GH245).
Trivial indexes, equivalent to range(n) where n is the length of the dimension, are no longer written to disk (GH245).

Bug fixes#

Compatibility fixes for pandas v0.15 (GH262).
Fixes for display and indexing of NaT (not-a-time) (GH238, GH240)
Fix slicing by label was an argument is a data array (GH250).
Test data is now shipped with the source distribution (GH253).
Ensure order does not matter when doing arithmetic with scalar data arrays (GH254).
Order of dimensions preserved with DataArray.to_dataframe (GH260).

v0.3 (21 September 2014)#

New features#

Revamped coordinates: “coordinates” now refer to all arrays that are not used to index a dimension. Coordinates are intended to allow for keeping track of arrays of metadata that describe the grid on which the points in “variable” arrays lie. They are preserved (when unambiguous) even though mathematical operations.
Dataset math xray.Dataset objects now support all arithmetic operations directly. Dataset-array operations map across all dataset variables; dataset-dataset operations act on each pair of variables with the same name.
GroupBy math: This provides a convenient shortcut for normalizing by the average value of a group.
The dataset __repr__ method has been entirely overhauled; dataset objects now show their values when printed.
You can now index a dataset with a list of variables to return a new dataset: ds[['foo', 'bar']].

Backwards incompatible changes#

Dataset.__eq__ and Dataset.__ne__ are now element-wise operations instead of comparing all values to obtain a single boolean. Use the method xray.Dataset.equals instead.

Deprecations#

Dataset.noncoords is deprecated: use Dataset.vars instead.
Dataset.select_vars deprecated: index a Dataset with a list of variable names instead.
DataArray.select_vars and DataArray.drop_vars deprecated: use xray.DataArray.reset_coords instead.

v0.2 (14 August 2014)#

This is major release that includes some new features and quite a few bug fixes. Here are the highlights:

There is now a direct constructor for DataArray objects, which makes it possible to create a DataArray without using a Dataset. This is highlighted in the refreshed tutorial.
You can perform aggregation operations like mean directly on xray.Dataset objects, thanks to Joe Hamman. These aggregation methods also worked on grouped datasets.
xray now works on Python 2.6, thanks to Anna Kuznetsova.
A number of methods and attributes were given more sensible (usually shorter) names: labeled -> sel, indexed -> isel, select -> select_vars, unselect -> drop_vars, dimensions -> dims, coordinates -> coords, attributes -> attrs.
New xray.Dataset.load_data and xray.Dataset.close methods for datasets facilitate lower level of control of data loaded from disk.

v0.1.1 (20 May 2014)#

xray 0.1.1 is a bug-fix release that includes changes that should be almost entirely backwards compatible with v0.1:

Python 3 support (GH53)
Required numpy version relaxed to 1.7 (GH129)
Return numpy.datetime64 arrays for non-standard calendars (GH126)
Support for opening datasets associated with NetCDF4 groups (GH127)
Bug-fixes for concatenating datetime arrays (GH134)

Special thanks to new contributors Thomas Kluyver, Joe Hamman and Alistair Miles.

v0.1 (2 May 2014)#

Initial release.

Package	Old	New
Python	3.5.3	3.6
numpy	1.12	1.14
pandas	0.19.2	0.24
dask	0.16 (tested: 2.4)	1.2
bottleneck	1.1 (tested: 1.2)	1.2
matplotlib	1.5 (tested: 3.1)	3.1