Weather and climate data#

Xarray can leverage metadata that follows the Climate and Forecast (CF) conventions if present. Examples include automatic labelling of plots with descriptive names and units if proper metadata is present and support for non-standard calendars used in climate science through the cftime module(Explained in the Non-standard calendars and dates outside the nanosecond-precision range section). There are also a number of geosciences-focused projects that build on xarray.

CF-compliant coordinate variables#

MetPy adds a metpy accessor that allows accessing coordinates with appropriate CF metadata using generic names x, y, vertical and time. There is also a cartopy_crs attribute that provides projection information, parsed from the appropriate CF metadata, as a Cartopy projection object. See the metpy documentation for more information.

Non-standard calendars and dates outside the nanosecond-precision range#

Through the standalone cftime library and a custom subclass of pandas.Index, xarray supports a subset of the indexing functionality enabled through the standard pandas.DatetimeIndex for dates from non-standard calendars commonly used in climate science or dates using a standard calendar, but outside the nanosecond-precision range (approximately between years 1678 and 2262).

Note

As of xarray version 0.11, by default, cftime.datetime objects will be used to represent times (either in indexes, as a CFTimeIndex, or in data arrays with dtype object) if any of the following are true:

  • The dates are from a non-standard calendar

  • Any dates are outside the nanosecond-precision range.

Otherwise pandas-compatible dates from a standard calendar will be represented with the np.datetime64[ns] data type, enabling the use of a pandas.DatetimeIndex or arrays with dtype np.datetime64[ns] and their full set of associated features.

As of pandas version 2.0.0, pandas supports non-nanosecond precision datetime values. For the time being, xarray still automatically casts datetime values to nanosecond-precision for backwards compatibility with older pandas versions; however, this is something we would like to relax going forward. See GH7493 for more discussion.

For example, you can create a DataArray indexed by a time coordinate with dates from a no-leap calendar and a CFTimeIndex will automatically be used:

In [1]: from itertools import product

In [2]: from cftime import DatetimeNoLeap

In [3]: dates = [
   ...:     DatetimeNoLeap(year, month, 1)
   ...:     for year, month in product(range(1, 3), range(1, 13))
   ...: ]
   ...: 

In [4]: da = xr.DataArray(np.arange(24), coords=[dates], dims=["time"], name="foo")

Xarray also includes a cftime_range() function, which enables creating a CFTimeIndex with regularly-spaced dates. For instance, we can create the same dates and DataArray we created above using:

In [5]: dates = xr.cftime_range(start="0001", periods=24, freq="MS", calendar="noleap")

In [6]: da = xr.DataArray(np.arange(24), coords=[dates], dims=["time"], name="foo")

Mirroring pandas’ method with the same name, infer_freq() allows one to infer the sampling frequency of a CFTimeIndex or a 1-D DataArray containing cftime objects. It also works transparently with np.datetime64[ns] and np.timedelta64[ns] data.

In [7]: xr.infer_freq(dates)
Out[7]: 'MS'

With strftime() we can also easily generate formatted strings from the datetime values of a CFTimeIndex directly or through the dt accessor for a DataArray using the same formatting as the standard datetime.strftime convention .

In [8]: dates.strftime("%c")
Out[8]: 
Index(['Tue Jan  1 00:00:00 0001', 'Fri Feb  1 00:00:00 0001',
       'Fri Mar  1 00:00:00 0001', 'Mon Apr  1 00:00:00 0001',
       'Wed May  1 00:00:00 0001', 'Sat Jun  1 00:00:00 0001',
       'Mon Jul  1 00:00:00 0001', 'Thu Aug  1 00:00:00 0001',
       'Sun Sep  1 00:00:00 0001', 'Tue Oct  1 00:00:00 0001',
       'Fri Nov  1 00:00:00 0001', 'Sun Dec  1 00:00:00 0001',
       'Wed Jan  1 00:00:00 0002', 'Sat Feb  1 00:00:00 0002',
       'Sat Mar  1 00:00:00 0002', 'Tue Apr  1 00:00:00 0002',
       'Thu May  1 00:00:00 0002', 'Sun Jun  1 00:00:00 0002',
       'Tue Jul  1 00:00:00 0002', 'Fri Aug  1 00:00:00 0002',
       'Mon Sep  1 00:00:00 0002', 'Wed Oct  1 00:00:00 0002',
       'Sat Nov  1 00:00:00 0002', 'Mon Dec  1 00:00:00 0002'],
      dtype='object')

In [9]: da["time"].dt.strftime("%Y%m%d")
Out[9]: 
<xarray.DataArray 'strftime' (time: 24)> Size: 192B
array(['00010101', '00010201', '00010301', ..., '00021001', '00021101', '00021201'], dtype=object)
Coordinates:
  * time     (time) object 192B 0001-01-01 00:00:00 ... 0002-12-01 00:00:00

Conversion between non-standard calendar and to/from pandas DatetimeIndexes is facilitated with the xarray.Dataset.convert_calendar() method (also available as xarray.DataArray.convert_calendar()). Here, like elsewhere in xarray, the use_cftime argument controls which datetime backend is used in the output. The default (None) is to use pandas when possible, i.e. when the calendar is standard and dates are within 1678 and 2262.

In [10]: dates = xr.cftime_range(start="2001", periods=24, freq="MS", calendar="noleap")

In [11]: da_nl = xr.DataArray(np.arange(24), coords=[dates], dims=["time"], name="foo")

In [12]: da_std = da.convert_calendar("standard", use_cftime=True)

The data is unchanged, only the timestamps are modified. Further options are implemented for the special "360_day" calendar and for handling missing dates. There is also xarray.Dataset.interp_calendar() (and xarray.DataArray.interp_calendar()) for interpolating data between calendars.

For data indexed by a CFTimeIndex xarray currently supports:

In [13]: da.sel(time="0001")
Out[13]: 
<xarray.DataArray 'foo' (time: 12)> Size: 96B
array([ 0,  1,  2, ...,  9, 10, 11])
Coordinates:
  * time     (time) object 96B 0001-01-01 00:00:00 ... 0001-12-01 00:00:00

In [14]: da.sel(time=slice("0001-05", "0002-02"))
Out[14]: 
<xarray.DataArray 'foo' (time: 10)> Size: 80B
array([ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13])
Coordinates:
  * time     (time) object 80B 0001-05-01 00:00:00 ... 0002-02-01 00:00:00

Note

For specifying full or partial datetime strings in cftime indexing, xarray supports two versions of the ISO 8601 standard, the basic pattern (YYYYMMDDhhmmss) or the extended pattern (YYYY-MM-DDThh:mm:ss), as well as the default cftime string format (YYYY-MM-DD hh:mm:ss). This is somewhat more restrictive than pandas; in other words, some datetime strings that would be valid for a pandas.DatetimeIndex are not valid for an CFTimeIndex.

  • Access of basic datetime components via the dt accessor (in this case just “year”, “month”, “day”, “hour”, “minute”, “second”, “microsecond”, “season”, “dayofyear”, “dayofweek”, and “days_in_month”) with the addition of “calendar”, absent from pandas:

In [15]: da.time.dt.year
Out[15]: 
<xarray.DataArray 'year' (time: 24)> Size: 192B
array([1, 1, 1, ..., 2, 2, 2])
Coordinates:
  * time     (time) object 192B 0001-01-01 00:00:00 ... 0002-12-01 00:00:00

In [16]: da.time.dt.month
Out[16]: 
<xarray.DataArray 'month' (time: 24)> Size: 192B
array([ 1,  2,  3, ..., 10, 11, 12])
Coordinates:
  * time     (time) object 192B 0001-01-01 00:00:00 ... 0002-12-01 00:00:00

In [17]: da.time.dt.season
Out[17]: 
<xarray.DataArray 'season' (time: 24)> Size: 288B
array(['DJF', 'DJF', 'MAM', ..., 'SON', 'SON', 'DJF'], dtype='<U3')
Coordinates:
  * time     (time) object 192B 0001-01-01 00:00:00 ... 0002-12-01 00:00:00

In [18]: da.time.dt.dayofyear
Out[18]: 
<xarray.DataArray 'dayofyear' (time: 24)> Size: 192B
array([  1,  32,  60, ..., 274, 305, 335])
Coordinates:
  * time     (time) object 192B 0001-01-01 00:00:00 ... 0002-12-01 00:00:00

In [19]: da.time.dt.dayofweek
Out[19]: 
<xarray.DataArray 'dayofweek' (time: 24)> Size: 192B
array([1, 4, 4, ..., 2, 5, 0])
Coordinates:
  * time     (time) object 192B 0001-01-01 00:00:00 ... 0002-12-01 00:00:00

In [20]: da.time.dt.days_in_month
Out[20]: 
<xarray.DataArray 'days_in_month' (time: 24)> Size: 192B
array([31, 28, 31, ..., 31, 30, 31])
Coordinates:
  * time     (time) object 192B 0001-01-01 00:00:00 ... 0002-12-01 00:00:00

In [21]: da.time.dt.calendar
Out[21]: 'noleap'
  • Rounding of datetimes to fixed frequencies via the dt accessor:

In [22]: da.time.dt.ceil("3D")
Out[22]: 
<xarray.DataArray 'ceil' (time: 24)> Size: 192B
array([cftime.DatetimeNoLeap(1, 1, 3, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1, 2, 2, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1, 3, 1, 0, 0, 0, 0, has_year_zero=True), ...,
       cftime.DatetimeNoLeap(2, 10, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(2, 11, 3, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(2, 12, 3, 0, 0, 0, 0, has_year_zero=True)], dtype=object)
Coordinates:
  * time     (time) object 192B 0001-01-01 00:00:00 ... 0002-12-01 00:00:00

In [23]: da.time.dt.floor("5D")
Out[23]: 
<xarray.DataArray 'floor' (time: 24)> Size: 192B
array([cftime.DatetimeNoLeap(1, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1, 1, 31, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1, 2, 25, 0, 0, 0, 0, has_year_zero=True), ...,
       cftime.DatetimeNoLeap(2, 9, 28, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(2, 10, 28, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(2, 11, 27, 0, 0, 0, 0, has_year_zero=True)], dtype=object)
Coordinates:
  * time     (time) object 192B 0001-01-01 00:00:00 ... 0002-12-01 00:00:00

In [24]: da.time.dt.round("2D")
Out[24]: 
<xarray.DataArray 'round' (time: 24)> Size: 192B
array([cftime.DatetimeNoLeap(1, 1, 2, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1, 2, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(1, 3, 1, 0, 0, 0, 0, has_year_zero=True), ...,
       cftime.DatetimeNoLeap(2, 9, 30, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(2, 11, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(2, 12, 1, 0, 0, 0, 0, has_year_zero=True)], dtype=object)
Coordinates:
  * time     (time) object 192B 0001-01-01 00:00:00 ... 0002-12-01 00:00:00
  • Group-by operations based on datetime accessor attributes (e.g. by month of the year):

In [25]: da.groupby("time.month").sum()
Out[25]: 
<xarray.DataArray 'foo' (month: 12)> Size: 96B
array([12, 14, 16, ..., 30, 32, 34])
Coordinates:
  * month    (month) int64 96B 1 2 3 4 5 6 7 8 9 10 11 12
In [26]: da.interp(time=[DatetimeNoLeap(1, 1, 15), DatetimeNoLeap(1, 2, 15)])
Out[26]: 
<xarray.DataArray 'foo' (time: 2)> Size: 16B
array([0.452, 1.5  ])
Coordinates:
  * time     (time) object 16B 0001-01-15 00:00:00 0001-02-15 00:00:00
  • Interpolation using datetime strings:

In [27]: da.interp(time=["0001-01-15", "0001-02-15"])
Out[27]: 
<xarray.DataArray 'foo' (time: 2)> Size: 16B
array([0.452, 1.5  ])
Coordinates:
  * time     (time) object 16B 0001-01-15 00:00:00 0001-02-15 00:00:00
  • Differentiation:

In [28]: da.differentiate("time")
Out[28]: 
<xarray.DataArray 'foo' (time: 24)> Size: 192B
array([3.734e-07, 3.944e-07, 3.944e-07, ..., 3.797e-07, 3.797e-07, 3.858e-07])
Coordinates:
  * time     (time) object 192B 0001-01-01 00:00:00 ... 0002-12-01 00:00:00
  • Serialization:

In [29]: da.to_netcdf("example-no-leap.nc")

In [30]: reopened = xr.open_dataset("example-no-leap.nc")

In [31]: reopened
Out[31]: 
<xarray.Dataset> Size: 384B
Dimensions:  (time: 24)
Coordinates:
  * time     (time) object 192B 0001-01-01 00:00:00 ... 0002-12-01 00:00:00
Data variables:
    foo      (time) int64 192B ...
  • And resampling along the time dimension for data indexed by a CFTimeIndex:

In [32]: da.resample(time="81min", closed="right", label="right", offset="3min").mean()
Out[32]: 
<xarray.DataArray 'foo' (time: 12428)> Size: 99kB
array([ 0., nan, nan, ..., nan, nan, 23.])
Coordinates:
  * time     (time) object 99kB 0001-01-01 00:03:00 ... 0002-12-01 00:30:00