Time series data#
A major use case for xarray is multi-dimensional time-series data. Accordingly, we’ve copied many of features that make working with time-series data in pandas such a joy to xarray. In most cases, we rely on pandas for the core functionality.
Creating datetime64 data#
Xarray uses the numpy dtypes datetime64[ns]
and timedelta64[ns]
to
represent datetime data, which offer vectorized (if sometimes buggy) operations
with numpy and smooth integration with pandas.
To convert to or create regular arrays of datetime64
data, we recommend
using pandas.to_datetime()
and pandas.date_range()
:
In [1]: pd.to_datetime(["2000-01-01", "2000-02-02"])
Out[1]: DatetimeIndex(['2000-01-01', '2000-02-02'], dtype='datetime64[ns]', freq=None)
In [2]: pd.date_range("2000-01-01", periods=365)
Out[2]:
DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
'2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08',
'2000-01-09', '2000-01-10',
...
'2000-12-21', '2000-12-22', '2000-12-23', '2000-12-24',
'2000-12-25', '2000-12-26', '2000-12-27', '2000-12-28',
'2000-12-29', '2000-12-30'],
dtype='datetime64[ns]', length=365, freq='D')
Alternatively, you can supply arrays of Python datetime
objects. These get
converted automatically when used as arguments in xarray objects:
In [3]: import datetime
In [4]: xr.Dataset({"time": datetime.datetime(2000, 1, 1)})
Out[4]:
<xarray.Dataset> Size: 8B
Dimensions: ()
Data variables:
time datetime64[ns] 8B 2000-01-01
When reading or writing netCDF files, xarray automatically decodes datetime and
timedelta arrays using CF conventions (that is, by using a units
attribute like 'days since 2000-01-01'
).
Note
When decoding/encoding datetimes for non-standard calendars or for dates
before year 1678 or after year 2262, xarray uses the cftime library.
It was previously packaged with the netcdf4-python
package under the
name netcdftime
but is now distributed separately. cftime
is an
optional dependency of xarray.
You can manual decode arrays in this form by passing a dataset to
decode_cf()
:
In [5]: attrs = {"units": "hours since 2000-01-01"}
In [6]: ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)})
In [7]: xr.decode_cf(ds)
Out[7]:
<xarray.Dataset> Size: 32B
Dimensions: (time: 4)
Coordinates:
* time (time) datetime64[ns] 32B 2000-01-01 ... 2000-01-01T03:00:00
Data variables:
*empty*
One unfortunate limitation of using datetime64[ns]
is that it limits the
native representation of dates to those that fall between the years 1678 and
2262. When a netCDF file contains dates outside of these bounds, dates will be
returned as arrays of cftime.datetime
objects and a CFTimeIndex
will be used for indexing. CFTimeIndex
enables a subset of
the indexing functionality of a pandas.DatetimeIndex
and is only
fully compatible with the standalone version of cftime
(not the version
packaged with earlier versions netCDF4
). See Non-standard calendars and dates outside the nanosecond-precision range for more
information.
Datetime indexing#
Xarray borrows powerful indexing machinery from pandas (see Indexing and selecting data).
This allows for several useful and succinct forms of indexing, particularly for
datetime64
data. For example, we support indexing with strings for single
items and with the slice
object:
In [8]: time = pd.date_range("2000-01-01", freq="h", periods=365 * 24)
In [9]: ds = xr.Dataset({"foo": ("time", np.arange(365 * 24)), "time": time})
In [10]: ds.sel(time="2000-01")
Out[10]:
<xarray.Dataset> Size: 12kB
Dimensions: (time: 744)
Coordinates:
* time (time) datetime64[ns] 6kB 2000-01-01 ... 2000-01-31T23:00:00
Data variables:
foo (time) int64 6kB 0 1 2 3 4 5 6 7 ... 737 738 739 740 741 742 743
In [11]: ds.sel(time=slice("2000-06-01", "2000-06-10"))
Out[11]:
<xarray.Dataset> Size: 4kB
Dimensions: (time: 240)
Coordinates:
* time (time) datetime64[ns] 2kB 2000-06-01 ... 2000-06-10T23:00:00
Data variables:
foo (time) int64 2kB 3648 3649 3650 3651 3652 ... 3884 3885 3886 3887
You can also select a particular time by indexing with a
datetime.time
object:
In [12]: ds.sel(time=datetime.time(12))
Out[12]:
<xarray.Dataset> Size: 6kB
Dimensions: (time: 365)
Coordinates:
* time (time) datetime64[ns] 3kB 2000-01-01T12:00:00 ... 2000-12-30T12:...
Data variables:
foo (time) int64 3kB 12 36 60 84 108 132 ... 8652 8676 8700 8724 8748
For more details, read the pandas documentation and the section on Indexing Using Datetime Components (i.e. using the .dt
accessor).
Datetime components#
Similar to pandas accessors, the components of datetime objects contained in a
given DataArray
can be quickly computed using a special .dt
accessor.
In [13]: time = pd.date_range("2000-01-01", freq="6h", periods=365 * 4)
In [14]: ds = xr.Dataset({"foo": ("time", np.arange(365 * 4)), "time": time})
In [15]: ds.time.dt.hour
Out[15]:
<xarray.DataArray 'hour' (time: 1460)> Size: 12kB
array([ 0, 6, 12, ..., 6, 12, 18])
Coordinates:
* time (time) datetime64[ns] 12kB 2000-01-01 ... 2000-12-30T18:00:00
In [16]: ds.time.dt.dayofweek
Out[16]:
<xarray.DataArray 'dayofweek' (time: 1460)> Size: 12kB
array([5, 5, 5, ..., 5, 5, 5])
Coordinates:
* time (time) datetime64[ns] 12kB 2000-01-01 ... 2000-12-30T18:00:00
The .dt
accessor works on both coordinate dimensions as well as
multi-dimensional data.
Xarray also supports a notion of “virtual” or “derived” coordinates for datetime components implemented by pandas, including “year”, “month”, “day”, “hour”, “minute”, “second”, “dayofyear”, “week”, “dayofweek”, “weekday” and “quarter”:
In [17]: ds["time.month"]
Out[17]:
<xarray.DataArray 'month' (time: 1460)> Size: 12kB
array([ 1, 1, 1, ..., 12, 12, 12])
Coordinates:
* time (time) datetime64[ns] 12kB 2000-01-01 ... 2000-12-30T18:00:00
In [18]: ds["time.dayofyear"]
Out[18]:
<xarray.DataArray 'dayofyear' (time: 1460)> Size: 12kB
array([ 1, 1, 1, ..., 365, 365, 365])
Coordinates:
* time (time) datetime64[ns] 12kB 2000-01-01 ... 2000-12-30T18:00:00
For use as a derived coordinate, xarray adds 'season'
to the list of
datetime components supported by pandas:
In [19]: ds["time.season"]
Out[19]:
<xarray.DataArray 'season' (time: 1460)> Size: 18kB
array(['DJF', 'DJF', 'DJF', ..., 'DJF', 'DJF', 'DJF'], dtype='<U3')
Coordinates:
* time (time) datetime64[ns] 12kB 2000-01-01 ... 2000-12-30T18:00:00
In [20]: ds["time"].dt.season
Out[20]:
<xarray.DataArray 'season' (time: 1460)> Size: 18kB
array(['DJF', 'DJF', 'DJF', ..., 'DJF', 'DJF', 'DJF'], dtype='<U3')
Coordinates:
* time (time) datetime64[ns] 12kB 2000-01-01 ... 2000-12-30T18:00:00
The set of valid seasons consists of ‘DJF’, ‘MAM’, ‘JJA’ and ‘SON’, labeled by the first letters of the corresponding months.
You can use these shortcuts with both Datasets and DataArray coordinates.
In addition, xarray supports rounding operations floor
, ceil
, and round
. These operations require that you supply a rounding frequency as a string argument.
In [21]: ds["time"].dt.floor("D")
Out[21]:
<xarray.DataArray 'floor' (time: 1460)> Size: 12kB
array(['2000-01-01T00:00:00.000000000', '2000-01-01T00:00:00.000000000',
'2000-01-01T00:00:00.000000000', ..., '2000-12-30T00:00:00.000000000',
'2000-12-30T00:00:00.000000000', '2000-12-30T00:00:00.000000000'], dtype='datetime64[ns]')
Coordinates:
* time (time) datetime64[ns] 12kB 2000-01-01 ... 2000-12-30T18:00:00
The .dt
accessor can also be used to generate formatted datetime strings
for arrays utilising the same formatting as the standard datetime.strftime.
In [22]: ds["time"].dt.strftime("%a, %b %d %H:%M")
Out[22]:
<xarray.DataArray 'strftime' (time: 1460)> Size: 12kB
array(['Sat, Jan 01 00:00', 'Sat, Jan 01 06:00', 'Sat, Jan 01 12:00', ..., 'Sat, Dec 30 06:00',
'Sat, Dec 30 12:00', 'Sat, Dec 30 18:00'], dtype=object)
Coordinates:
* time (time) datetime64[ns] 12kB 2000-01-01 ... 2000-12-30T18:00:00
Indexing Using Datetime Components#
You can use use the .dt
accessor when subsetting your data as well. For example, we can subset for the month of January using the following:
In [23]: ds.isel(time=(ds.time.dt.month == 1))
Out[23]:
<xarray.Dataset> Size: 2kB
Dimensions: (time: 124)
Coordinates:
* time (time) datetime64[ns] 992B 2000-01-01 ... 2000-01-31T18:00:00
Data variables:
foo (time) int64 992B 0 1 2 3 4 5 6 7 ... 117 118 119 120 121 122 123
You can also search for multiple months (in this case January through March), using isin
:
In [24]: ds.isel(time=ds.time.dt.month.isin([1, 2, 3]))
Out[24]:
<xarray.Dataset> Size: 6kB
Dimensions: (time: 364)
Coordinates:
* time (time) datetime64[ns] 3kB 2000-01-01 ... 2000-03-31T18:00:00
Data variables:
foo (time) int64 3kB 0 1 2 3 4 5 6 7 ... 357 358 359 360 361 362 363
Resampling and grouped operations#
Datetime components couple particularly well with grouped operations (see GroupBy: Group and Bin Data) for analyzing features that repeat over time. Here’s how to calculate the mean by time of day:
In [25]: ds.groupby("time.hour").mean()
Out[25]:
<xarray.Dataset> Size: 64B
Dimensions: (hour: 4)
Coordinates:
* hour (hour) int64 32B 0 6 12 18
Data variables:
foo (hour) float64 32B 728.0 729.0 730.0 731.0
For upsampling or downsampling temporal resolutions, xarray offers a
resample()
method building on the core functionality
offered by the pandas method of the same name. Resample uses essentially the
same api as resample
in pandas.
For example, we can downsample our dataset from hourly to 6-hourly:
In [26]: ds.resample(time="6h")
Out[26]:
<DatasetResample, grouped over 1 grouper(s), 1460 groups in total:
'__resample_dim__': 1460/1460 groups present with labels 2000-01-01, ..., 2000-12-30T1...>
This will create a specialized Resample
object which saves information
necessary for resampling. All of the reduction methods which work with
Resample
objects can also be used for resampling:
In [27]: ds.resample(time="6h").mean()
Out[27]:
<xarray.Dataset> Size: 23kB
Dimensions: (time: 1460)
Coordinates:
* time (time) datetime64[ns] 12kB 2000-01-01 ... 2000-12-30T18:00:00
Data variables:
foo (time) float64 12kB 0.0 1.0 2.0 ... 1.457e+03 1.458e+03 1.459e+03
You can also supply an arbitrary reduction function to aggregate over each resampling group:
In [28]: ds.resample(time="6h").reduce(np.mean)
Out[28]:
<xarray.Dataset> Size: 23kB
Dimensions: (time: 1460)
Coordinates:
* time (time) datetime64[ns] 12kB 2000-01-01 ... 2000-12-30T18:00:00
Data variables:
foo (time) float64 12kB 0.0 1.0 2.0 ... 1.457e+03 1.458e+03 1.459e+03
You can also resample on the time dimension while applying reducing along other dimensions at the same time
by specifying the dim
keyword argument
ds.resample(time="6h").mean(dim=["time", "latitude", "longitude"])
For upsampling, xarray provides six methods: asfreq
, ffill
, bfill
, pad
,
nearest
and interpolate
. interpolate
extends scipy.interpolate.interp1d
and supports all of its schemes. All of these resampling operations work on both
Dataset and DataArray objects with an arbitrary number of dimensions.
In order to limit the scope of the methods ffill
, bfill
, pad
and
nearest
the tolerance
argument can be set in coordinate units.
Data that has indices outside of the given tolerance
are set to NaN
.
In [29]: ds.resample(time="1h").nearest(tolerance="1h")
Out[29]:
<xarray.Dataset> Size: 140kB
Dimensions: (time: 8755)
Coordinates:
* time (time) datetime64[ns] 70kB 2000-01-01 ... 2000-12-30T18:00:00
Data variables:
foo (time) float64 70kB 0.0 0.0 nan nan ... nan nan 1.459e+03 1.459e+03
It is often desirable to center the time values after a resampling operation. That can be accomplished by updating the resampled dataset time coordinate values using time offset arithmetic via the pandas.tseries.frequencies.to_offset function.
In [30]: resampled_ds = ds.resample(time="6h").mean()
In [31]: offset = pd.tseries.frequencies.to_offset("6h") / 2
In [32]: resampled_ds["time"] = resampled_ds.get_index("time") + offset
In [33]: resampled_ds
Out[33]:
<xarray.Dataset> Size: 23kB
Dimensions: (time: 1460)
Coordinates:
* time (time) datetime64[ns] 12kB 2000-01-01T03:00:00 ... 2000-12-30T21...
Data variables:
foo (time) float64 12kB 0.0 1.0 2.0 ... 1.457e+03 1.458e+03 1.459e+03
For more examples of using grouped operations on a time dimension, see Toy weather data.