What’s New¶
v0.9.0 (25 January 2017)¶
This major release includes five months worth of enhancements and bug fixes from 24 contributors, including some significant changes that are not fully backwards compatible. Highlights include:
- Coordinates are now optional in the xarray data model, even for dimensions.
- Changes to caching, lazy loading and pickling to improve xarray’s experience for parallel computing.
- Improvements for accessing and manipulating
pandas.MultiIndex
levels. - Many new methods and functions, including
quantile()
,cumsum()
,cumprod()
combine_first
set_index()
,reset_index()
,reorder_levels()
,full_like()
,zeros_like()
,ones_like()
open_dataarray()
,compute()
,Dataset.info()
,testing.assert_equal()
,testing.assert_identical()
, andtesting.assert_allclose()
.
Breaking changes¶
Index coordinates for each dimensions are now optional, and no longer created by default GH1017. You can identify such dimensions without indexes by their appearance in list of “Unindexed dimensions” in the
Dataset
orDataArray
repr:In [1]: xr.Dataset({'foo': (('x', 'y'), [[1, 2]])}) Out[1]: <xarray.Dataset> Dimensions: (x: 1, y: 2) Coordinates: *empty* Unindexed dimensions: x, y Data variables: foo (x, y) int64 1 2
This has a number of implications:
align()
andreindex()
can now error, if dimensions labels are missing and dimensions have different sizes.- Because pandas does not support missing indexes, methods such as
to_dataframe
/from_dataframe
andstack
/unstack
no longer roundtrip faithfully on all inputs. Usereset_index()
to remove undesired indexes. Dataset.__delitem__
anddrop()
no longer delete/drop variables that have dimensions matching a deleted/dropped variable.DataArray.coords.__delitem__
is now allowed on variables matching dimension names..sel
and.loc
now handle indexing along a dimension without coordinate labels by doing integer based indexing. See Missing coordinate labels for an example.indexes
is no longer guaranteed to include all dimensions names as keys. The new methodget_index()
has been added to get an index for a dimension guaranteed, falling back to produce a defaultRangeIndex
if necessary.
The default behavior of
merge
is nowcompat='no_conflicts'
, so some merges will now succeed in cases that previously raisedxarray.MergeError
. Setcompat='broadcast_equals'
to restore the previous default. See Merging with ‘no_conflicts’ for more details.Reading
values
no longer always caches values in a NumPy array GH1128. Caching of.values
on variables read from netCDF files on disk is still the default whenopen_dataset()
is called withcache=True
. By Guido Imperiale and Stephan Hoyer.Pickling a
Dataset
orDataArray
linked to a file on disk no longer caches its values into memory before pickling (GH1128). Instead, pickle stores file paths and restores objects by reopening file references. This enables preliminary, experimental use of xarray for opening files with dask.distributed. By Stephan Hoyer.Coordinates used to index a dimension are now loaded eagerly into
pandas.Index
objects, instead of loading the values lazily. By Guido Imperiale.Automatic levels for 2d plots are now guaranteed to land on
vmin
andvmax
when these kwargs are explicitly provided (GH1191). The automated level selection logic also slightly changed. By Fabien Maussion.DataArray.rename()
behavior changed to strictly change theDataArray.name
if called with string argument, or strictly change coordinate names if called with dict-like argument. By Markus Gonser.By default
to_netcdf()
add a_FillValue = NaN
attributes to float types. By Frederic Laliberte.repr
onDataArray
objects uses an shortened display for NumPy array data that is less likely to overflow onto multiple pages (GH1207). By Stephan Hoyer.xarray no longer supports python 3.3, versions of dask prior to v0.9.0, or versions of bottleneck prior to v1.0.
Deprecations¶
- Renamed the
Coordinate
class from xarray’s low level API toIndexVariable
.Variable.to_variable
andVariable.to_coord
have been renamed toto_base_variable()
andto_index_variable()
. - Deprecated supplying
coords
as a dictionary to theDataArray
constructor without also supplying an explicitdims
argument. The old behavior encouraged relying on the iteration order of dictionaries, which is a bad practice (GH727). - Removed a number of methods deprecated since v0.7.0 or earlier:
load_data
,vars
,drop_vars
,dump
,dumps
and thevariables
keyword argument toDataset
. - Removed the dummy module that enabled
import xray
.
Enhancements¶
- Added new method
combine_first()
toDataArray
andDataset
, based on the pandas method of the same name (see Combine). By Chun-Wei Yuan. - Added the ability to change default automatic alignment (arithmetic_join=”inner”)
for binary operations via
set_options()
(see Automatic alignment). By Chun-Wei Yuan. - Add checking of
attr
names and values when saving to netCDF, raising useful error messages if they are invalid. (GH911). By Robin Wilson. - Added ability to save
DataArray
objects directly to netCDF files usingto_netcdf()
, and to load directly from netCDF files usingopen_dataarray()
(GH915). These remove the need to convert aDataArray
to aDataset
before saving as a netCDF file, and deals with names to ensure a perfect ‘roundtrip’ capability. By Robin Wilson. - Multi-index levels are now accessible as “virtual” coordinate variables,
e.g.,
ds['time']
can pull out the'time'
level of a multi-index (see Coordinates).sel
also accepts providing multi-index levels as keyword arguments, e.g.,ds.sel(time='2000-01')
(see Multi-level indexing). By Benoit Bovy. - Added
set_index
,reset_index
andreorder_levels
methods to easily create and manipulate (multi-)indexes (see Set and reset index). By Benoit Bovy. - Added the
compat
option'no_conflicts'
tomerge
, allowing the combination of xarray objects with disjoint (GH742) or overlapping (GH835) coordinates as long as all present data agrees. By Johnnie Gray. See Merging with ‘no_conflicts’ for more details. - It is now possible to set
concat_dim=None
explicitly inopen_mfdataset()
to disable inferring a dimension along which to concatenate. By Stephan Hoyer. - Added methods
DataArray.compute()
,Dataset.compute()
, andVariable.compute()
as a non-mutating alternative toload()
. By Guido Imperiale. - Adds DataArray and Dataset methods
cumsum()
andcumprod()
. By Phillip J. Wolfram. - New properties
Dataset.sizes
andDataArray.sizes
for providing consistent access to dimension length on bothDataset
andDataArray
(GH921). By Stephan Hoyer. - New keyword argument
drop=True
forsel()
,isel()
andsqueeze()
for dropping scalar coordinates that arise from indexing.DataArray
(GH242). By Stephan Hoyer. - New top-level functions
full_like()
,zeros_like()
, andones_like()
By Guido Imperiale. - Overriding a preexisting attribute with
register_dataset_accessor()
orregister_dataarray_accessor()
now issues a warning instead of raising an error (GH1082). By Stephan Hoyer. - Options for axes sharing between subplots are exposed to
FacetGrid
andplot()
, so axes sharing can be disabled for polar plots. By Bas Hoonhout. - New utility functions
assert_equal()
,assert_identical()
, andassert_allclose()
for asserting relationships between xarray objects, designed for use in a pytest test suite. figsize
,size
andaspect
plot arguments are now supported for all plots (GH897). See Controlling the figure size for more details. By Stephan Hoyer and Fabien Maussion.- New
info()
method to summarizeDataset
variables and attributes. The method prints to a buffer (e.g.stdout
) with output similar to what the command line utilityncdump -h
produces (GH1150). By Joe Hamman. - Added the ability write unlimited netCDF dimensions with the
scipy
andnetcdf4
backends via the newencoding
attribute or via theunlimited_dims
argument toto_netcdf()
. By Joe Hamman. - New
quantile()
method to calculate quantiles from DataArray objects (GH1187). By Joe Hamman.
Bug fixes¶
groupby_bins
now restores empty bins by default (GH1019). By Ryan Abernathey.- Fix issues for dates outside the valid range of pandas timestamps (GH975). By Mathias Hauser.
- Unstacking produced flipped array after stacking decreasing coordinate values (GH980). By Stephan Hoyer.
- Setting
dtype
via theencoding
parameter ofto_netcdf
failed if the encoded dtype was the same as the dtype of the original array (GH873). By Stephan Hoyer. - Fix issues with variables where both attributes
_FillValue
andmissing_value
are set toNaN
(GH997). By Marco Zühlke. .where()
and.fillna()
now preserve attributes (GH1009). By Fabien Maussion.- Applying
broadcast()
to an xarray object based on the dask backend won’t accidentally convert the array from dask to numpy anymore (GH978). By Guido Imperiale. Dataset.concat()
now preserves variables order (GH1027). By Fabien Maussion.- Fixed an issue with pcolormesh (GH781). A new
infer_intervals
keyword gives control on whether the cell intervals should be computed or not. By Fabien Maussion. - Grouping over an dimension with non-unique values with
groupby
gives correct groups. By Stephan Hoyer. - Fixed accessing coordinate variables with non-string names from
.coords
. By Stephan Hoyer. rename()
now simultaneously renames the array and any coordinate with the same name, when supplied via adict
(GH1116). By Yves Delley.- Fixed sub-optimal performance in certain operations with object arrays (GH1121). By Yves Delley.
- Fix
.groupby(group)
whengroup
has datetime dtype (GH1132). By Jonas Sølvsteen. - Fixed a bug with facetgrid (the
norm
keyword was ignored, GH1159). By Fabien Maussion. - Resolved a concurrency bug that could cause Python to crash when simultaneously reading and writing netCDF4 files with dask (GH1172). By Stephan Hoyer.
- Fix to make
.copy()
actually copy dask arrays, which will be relevant for future releases of dask in which dask arrays will be mutable (GH1180). By Stephan Hoyer. - Fix opening NetCDF files with multi-dimensional time variables (GH1229). By Stephan Hoyer.
Performance improvements¶
isel_points()
andsel_points()
now use vectorised indexing in numpy and dask (GH1161), which can result in several orders of magnitude speedup. By Jonathan Chambers.
v0.8.2 (18 August 2016)¶
This release includes a number of bug fixes and minor enhancements.
Breaking changes¶
broadcast()
andconcat()
now auto-align inputs, usingjoin=outer
. Previously, these functions raisedValueError
for non-aligned inputs. By Guido Imperiale.
Enhancements¶
- New documentation on Transitioning from pandas.Panel to xarray. By Maximilian Roos.
- New
Dataset
andDataArray
methodsto_dict()
andfrom_dict()
to allow easy conversion between dictionaries and xarray objects (GH432). See dictionary IO for more details. By Julia Signell. - Added
exclude
andindexes
optional parameters toalign()
, andexclude
optional parameter tobroadcast()
. By Guido Imperiale. - Better error message when assigning variables without dimensions (GH971). By Stephan Hoyer.
- Better error message when reindex/align fails due to duplicate index values (GH956). By Stephan Hoyer.
Bug fixes¶
- Ensure xarray works with h5netcdf v0.3.0 for arrays with
dtype=str
(GH953). By Stephan Hoyer. Dataset.__dir__()
(i.e. the method python calls to get autocomplete options) failed if one of the dataset’s keys was not a string (GH852). By Maximilian Roos.Dataset
constructor can now take arbitrary objects as values (GH647). By Maximilian Roos.- Clarified
copy
argument forreindex()
andalign()
, which now consistently always return new xarray objects (GH927). - Fix
open_mfdataset
withengine='pynio'
(GH936). By Stephan Hoyer. groupby_bins
sorted bin labels as strings (GH952). By Stephan Hoyer.- Fix bug introduced by v0.8.0 that broke assignment to datasets when both the left and right side have the same non-unique index values (GH956).
v0.8.1 (5 August 2016)¶
Bug fixes¶
- Fix bug in v0.8.0 that broke assignment to Datasets with non-unique indexes (GH943). By Stephan Hoyer.
v0.8.0 (2 August 2016)¶
This release includes four months of new features and bug fixes, including several breaking changes.
Breaking changes¶
- Dropped support for Python 2.6 (GH855).
- Indexing on multi-index now drop levels, which is consistent with pandas. It also changes the name of the dimension / coordinate when the multi-index is reduced to a single index (GH802).
- Contour plots no longer add a colorbar per default (GH866). Filled contour plots are unchanged.
DataArray.values
and.data
now always returns an NumPy array-like object, even for 0-dimensional arrays with object dtype (GH867). Previously,.values
returned native Python objects in such cases. To convert the values of scalar arrays to Python objects, use the.item()
method.
Enhancements¶
- Groupby operations now support grouping over multidimensional variables. A new
method called
groupby_bins()
has also been added to allow users to specify bins for grouping. The new features are described in Multidimensional Grouping and Working with Multidimensional Coordinates. By Ryan Abernathey. - DataArray and Dataset method
where()
now supports adrop=True
option that clips coordinate elements that are fully masked. By Phillip J. Wolfram. - New top level
merge()
function allows for combining variables from any number ofDataset
and/orDataArray
variables. See Merge for more details. By Stephan Hoyer. - DataArray and Dataset method
resample()
now supports thekeep_attrs=False
option that determines whether variable and dataset attributes are retained in the resampled object. By Jeremy McGibbon. - Better multi-index support in DataArray and Dataset
sel()
andloc()
methods, which now behave more closely to pandas and which also accept dictionaries for indexing based on given level names and labels (see Multi-level indexing). By Benoit Bovy. - New (experimental) decorators
register_dataset_accessor()
andregister_dataarray_accessor()
for registering custom xarray extensions without subclassing. They are described in the new documentation page on xarray Internals. By Stephan Hoyer. - Round trip boolean datatypes. Previously, writing boolean datatypes to netCDF formats would raise an error since netCDF does not have a bool datatype. This feature reads/writes a dtype attribute to boolean variables in netCDF files. By Joe Hamman.
- 2D plotting methods now have two new keywords (cbar_ax and cbar_kwargs), allowing more control on the colorbar (GH872). By Fabien Maussion.
- New Dataset method
filter_by_attrs()
, akin tonetCDF4.Dataset.get_variables_by_attributes
, to easily filter data variables using its attributes. Filipe Fernandes.
Bug fixes¶
- Attributes were being retained by default for some resampling
operations when they should not. With the
keep_attrs=False
option, they will no longer be retained by default. This may be backwards-incompatible with some scripts, but the attributes may be kept by adding thekeep_attrs=True
option. By Jeremy McGibbon. - Concatenating xarray objects along an axis with a MultiIndex or PeriodIndex preserves the nature of the index (GH875). By Stephan Hoyer.
- Fixed bug in arithmetic operations on DataArray objects whose dimensions are numpy structured arrays or recarrays GH861, GH837. By Maciek Swat.
decode_cf_timedelta
now accepts arrays withndim
>1 (GH842).- This fixes issue GH665. Filipe Fernandes.
- Fix a bug where xarray.ufuncs that take two arguments would incorrectly use to numpy functions instead of dask.array functions (GH876). By Stephan Hoyer.
- Support for pickling functions from
xarray.ufuncs
(GH901). By Stephan Hoyer. Variable.copy(deep=True)
no longer converts MultiIndex into a base Index (GH769). By Benoit Bovy.- Fixes for groupby on dimensions with a multi-index (GH867). By Stephan Hoyer.
- Fix printing datasets with unicode attributes on Python 2 (GH892). By Stephan Hoyer.
- Fixed incorrect test for dask version (GH891). By Stephan Hoyer.
- Fixed dim argument for isel_points/sel_points when a pandas.Index is passed. By Stephan Hoyer.
contour()
now plots the correct number of contours (GH866). By Fabien Maussion.
v0.7.2 (13 March 2016)¶
This release includes two new, entirely backwards compatible features and several bug fixes.
Enhancements¶
New DataArray method
DataArray.dot()
for calculating the dot product of two DataArrays along shared dimensions. By Dean Pospisil.Rolling window operations on DataArray objects are now supported via a new
DataArray.rolling()
method. For example:In [2]: import xarray as xr; import numpy as np In [3]: arr = xr.DataArray(np.arange(0, 7.5, 0.5).reshape(3, 5), dims=('x', 'y')) In [4]: arr Out[4]: <xarray.DataArray (x: 3, y: 5)> array([[ 0. , 0.5, 1. , 1.5, 2. ], [ 2.5, 3. , 3.5, 4. , 4.5], [ 5. , 5.5, 6. , 6.5, 7. ]]) Coordinates: * x (x) int64 0 1 2 * y (y) int64 0 1 2 3 4 In [5]: arr.rolling(y=3, min_periods=2).mean() Out[5]: <xarray.DataArray (x: 3, y: 5)> array([[ nan, 0.25, 0.5 , 1. , 1.5 ], [ nan, 2.75, 3. , 3.5 , 4. ], [ nan, 5.25, 5.5 , 6. , 6.5 ]]) Coordinates: * x (x) int64 0 1 2 * y (y) int64 0 1 2 3 4
See Rolling window operations for more details. By Joe Hamman.
Bug fixes¶
- Fixed an issue where plots using pcolormesh and Cartopy axes were being distorted
by the inference of the axis interval breaks. This change chooses not to modify
the coordinate variables when the axes have the attribute
projection
, allowing Cartopy to handle the extent of pcolormesh plots (GH781). By Joe Hamman. - 2D plots now better handle additional coordinates which are not
DataArray
dimensions (GH788). By Fabien Maussion.
v0.7.1 (16 February 2016)¶
This is a bug fix release that includes two small, backwards compatible enhancements. We recommend that all users upgrade.
Enhancements¶
Bug fixes¶
- Restore checks for shape consistency between data and coordinates in the DataArray constructor (GH758).
- Single dimension variables no longer transpose as part of a broader
.transpose
. This behavior was causingpandas.PeriodIndex
dimensions to lose their type (GH749) Dataset
labels remain as their native type on.to_dataset
. Previously they were coerced to strings (GH745)- Fixed a bug where replacing a
DataArray
index coordinate would improperly align the coordinate (GH725). DataArray.reindex_like
now maintains the dtype of complex numbers when reindexing leads to NaN values (GH738).Dataset.rename
andDataArray.rename
support the old and new names being the same (GH724).- Fix
from_dataset()
for DataFrames with Categorical column and a MultiIndex index (GH737). - Fixes to ensure xarray works properly after the upcoming pandas v0.18 and NumPy v1.11 releases.
Acknowledgments¶
The following individuals contributed to this release:
- Edward Richards
- Maximilian Roos
- Rafael Guedes
- Spencer Hill
- Stephan Hoyer
v0.7.0 (21 January 2016)¶
This major release includes redesign of DataArray
internals, as well as new methods for reshaping, rolling and shifting
data. It includes preliminary support for pandas.MultiIndex
,
as well as a number of other features and bug fixes, several of which
offer improved compatibility with pandas.
New name¶
The project formerly known as “xray” is now “xarray”, pronounced “x-array”! This avoids a namespace conflict with the entire field of x-ray science. Renaming our project seemed like the right thing to do, especially because some scientists who work with actual x-rays are interested in using this project in their work. Thanks for your understanding and patience in this transition. You can now find our documentation and code repository at new URLs:
To ease the transition, we have simultaneously released v0.7.0 of both
xray
and xarray
on the Python Package Index. These packages are
identical. For now, import xray
still works, except it issues a
deprecation warning. This will be the last xray release. Going forward, we
recommend switching your import statements to import xarray as xr
.
Breaking changes¶
The internal data model used by
DataArray
has been rewritten to fix several outstanding issues (GH367, GH634, this stackoverflow report). Internally,DataArray
is now implemented in terms of._variable
and._coords
attributes instead of holding variables in aDataset
object.This refactor ensures that if a DataArray has the same name as one of its coordinates, the array and the coordinate no longer share the same data.
In practice, this means that creating a DataArray with the same
name
as one of its dimensions no longer automatically uses that array to label the corresponding coordinate. You will now need to provide coordinate labels explicitly. Here’s the old behavior:In [6]: xray.DataArray([4, 5, 6], dims='x', name='x') Out[6]: <xray.DataArray 'x' (x: 3)> array([4, 5, 6]) Coordinates: * x (x) int64 4 5 6
and the new behavior (compare the values of the
x
coordinate):In [7]: xray.DataArray([4, 5, 6], dims='x', name='x') Out[7]: <xray.DataArray 'x' (x: 3)> array([4, 5, 6]) Coordinates: * x (x) int64 0 1 2
It is no longer possible to convert a DataArray to a Dataset with
xray.DataArray.to_dataset()
if it is unnamed. This will now raiseValueError
. If the array is unnamed, you need to supply thename
argument.
Enhancements¶
Basic support for
MultiIndex
coordinates on xray objects, including indexing,stack()
andunstack()
:In [8]: df = pd.DataFrame({'foo': range(3), ...: 'x': ['a', 'b', 'b'], ...: 'y': [0, 0, 1]}) ...: In [9]: s = df.set_index(['x', 'y'])['foo'] In [10]: arr = xray.DataArray(s, dims='z') In [11]: arr Out[11]: <xray.DataArray 'foo' (z: 3)> array([0, 1, 2]) Coordinates: * z (z) object ('a', 0) ('b', 0) ('b', 1) In [12]: arr.indexes['z'] Out[12]: MultiIndex(levels=[[u'a', u'b'], [0, 1]], labels=[[0, 1, 1], [0, 0, 1]], names=[u'x', u'y']) In [13]: arr.unstack('z') Out[13]: <xray.DataArray 'foo' (x: 2, y: 2)> array([[ 0., nan], [ 1., 2.]]) Coordinates: * x (x) object 'a' 'b' * y (y) int64 0 1 In [14]: arr.unstack('z').stack(z=('x', 'y')) Out[14]: <xray.DataArray 'foo' (z: 4)> array([ 0., nan, 1., 2.]) Coordinates: * z (z) object ('a', 0) ('a', 1) ('b', 0) ('b', 1)
See Stack and unstack for more details.
Warning
xray’s MultiIndex support is still experimental, and we have a long to- do list of desired additions (GH719), including better display of multi-index levels when printing a
Dataset
, and support for saving datasets with a MultiIndex to a netCDF file. User contributions in this area would be greatly appreciated.Support for reading GRIB, HDF4 and other file formats via PyNIO. See Formats supported by PyNIO for more details.
Better error message when a variable is supplied with the same name as one of its dimensions.
Plotting: more control on colormap parameters (GH642).
vmin
andvmax
will not be silently ignored anymore. Settingcenter=False
prevents automatic selection of a divergent colormap.New
shift()
androll()
methods for shifting/rotating datasets or arrays along a dimension:In [15]: array = xray.DataArray([5, 6, 7, 8], dims='x') In [16]: array.shift(x=2) Out[16]: <xarray.DataArray (x: 4)> array([ nan, nan, 5., 6.]) Unindexed dimensions: x In [17]: array.roll(x=2)