Time Coding#
This page gives an overview how xarray encodes and decodes times and which conventions and functions are used.
Pandas functionality#
to_datetime#
The function pandas.to_datetime()
is used within xarray for inferring units and for testing purposes.
In normal operation pandas.to_datetime()
returns a pandas.Timestamp
(for scalar input) or pandas.DatetimeIndex
(for array-like input) which are related to np.datetime64
values with a resolution inherited from the input (can be one of 's'
, 'ms'
, 'us'
, 'ns'
). If no resolution can be inherited 'ns'
is assumed. That has the implication that the maximum usable time range for those cases is approximately +/- 292 years centered around the Unix epoch (1970-01-01). To accommodate that, we carefully check the units/resolution in the encoding and decoding step.
When the arguments are numeric (not strings or np.datetime64
values) "unit"
can be anything from 'Y'
, 'W'
, 'D'
, 'h'
, 'm'
, 's'
, 'ms'
, 'us'
or 'ns'
, though the returned resolution will be "ns"
.
In [1]: f"Minimum datetime: {pd.to_datetime(int64_min, unit="ns")}"
Out[1]: 'Minimum datetime: 1677-09-21 00:12:43.145224193'
In [2]: f"Maximum datetime: {pd.to_datetime(int64_max, unit="ns")}"
Out[2]: 'Maximum datetime: 2262-04-11 23:47:16.854775807'
For input values which can’t be represented in nanosecond resolution an pandas.OutOfBoundsDatetime
exception is raised:
In [3]: try:
...: dtime = pd.to_datetime(int64_max, unit="us")
...: except Exception as err:
...: print(err)
...:
Out of bounds nanosecond timestamp: 294247-01-10 04:00:54
In [4]: try:
...: dtime = pd.to_datetime(uint64_max, unit="ns")
...: print("Wrong:", dtime)
...: dtime = pd.to_datetime([uint64_max], unit="ns")
...: except Exception as err:
...: print(err)
...:
Wrong: 1969-12-31 23:59:59.999999999
cannot convert input 18446744073709551615 with the unit 'ns', at position 0
np.datetime64
values can be extracted with pandas.Timestamp.to_numpy()
and pandas.DatetimeIndex.to_numpy()
. The returned resolution depends on the internal representation. This representation can be changed using pandas.Timestamp.as_unit()
and pandas.DatetimeIndex.as_unit()
respectively.
as_unit
takes one of 's'
, 'ms'
, 'us'
, 'ns'
as an argument. That means we are able to represent datetimes with second, millisecond, microsecond or nanosecond resolution.
In [5]: time = pd.to_datetime(np.datetime64(0, "D"))
In [6]: print("Datetime:", time, np.asarray([time.to_numpy()]).dtype)
Datetime: 1970-01-01 00:00:00 datetime64[s]
In [7]: print("Datetime as_unit('ms'):", time.as_unit("ms"))
Datetime as_unit('ms'): 1970-01-01 00:00:00
In [8]: print("Datetime to_numpy():", time.as_unit("ms").to_numpy())
Datetime to_numpy(): 1970-01-01T00:00:00.000
In [9]: time = pd.to_datetime(np.array([-1000, 1, 2], dtype="datetime64[Y]"))
In [10]: print("DatetimeIndex:", time)
DatetimeIndex: DatetimeIndex(['970-01-01', '1971-01-01', '1972-01-01'], dtype='datetime64[s]', freq=None)
In [11]: print("DatetimeIndex as_unit('us'):", time.as_unit("us"))
DatetimeIndex as_unit('us'): DatetimeIndex(['970-01-01', '1971-01-01', '1972-01-01'], dtype='datetime64[us]', freq=None)
In [12]: print("DatetimeIndex to_numpy():", time.as_unit("us").to_numpy())
DatetimeIndex to_numpy(): ['0970-01-01T00:00:00.000000' '1971-01-01T00:00:00.000000'
'1972-01-01T00:00:00.000000']
Warning
Input data with resolution higher than 'ns'
(eg. 'ps'
, 'fs'
, 'as'
) is truncated (not rounded) at the 'ns'
-level. This is currently broken for the 'ps'
input, where it is interpreted as 'ns'
.
In [13]: print("Good:", pd.to_datetime([np.datetime64(1901901901901, "as")]))
Good: DatetimeIndex(['1970-01-01 00:00:00.000001901'], dtype='datetime64[ns]', freq=None)
In [14]: print("Good:", pd.to_datetime([np.datetime64(1901901901901, "fs")]))
Good: DatetimeIndex(['1970-01-01 00:00:00.001901901'], dtype='datetime64[ns]', freq=None)
In [15]: print(" Bad:", pd.to_datetime([np.datetime64(1901901901901, "ps")]))
Bad: DatetimeIndex(['1970-01-01 00:31:41.901901901'], dtype='datetime64[ns]', freq=None)
In [16]: print("Good:", pd.to_datetime([np.datetime64(1901901901901, "ns")]))
Good: DatetimeIndex(['1970-01-01 00:31:41.901901901'], dtype='datetime64[ns]', freq=None)
In [17]: print("Good:", pd.to_datetime([np.datetime64(1901901901901, "us")]))
Good: DatetimeIndex(['1970-01-23 00:18:21.901901'], dtype='datetime64[ns]', freq=None)
In [18]: print("Good:", pd.to_datetime([np.datetime64(1901901901901, "ms")]))
Good: DatetimeIndex(['2030-04-08 18:05:01.901000'], dtype='datetime64[ns]', freq=None)
Warning
Care has to be taken, as some configurations of input data will raise. The following shows, that we are safe to use pandas.to_datetime()
when providing numpy.datetime64
as scalar or numpy array as input.
In [19]: print(
....: "Works:",
....: np.datetime64(1901901901901, "s"),
....: pd.to_datetime(np.datetime64(1901901901901, "s")),
....: )
....:
Works: 62238-11-15T11:51:41 62238-11-15 11:51:41
In [20]: print(
....: "Works:",
....: np.array([np.datetime64(1901901901901, "s")]),
....: pd.to_datetime(np.array([np.datetime64(1901901901901, "s")])),
....: )
....:
Works: ['62238-11-15T11:51:41'] DatetimeIndex(['62238-11-15 11:51:41'], dtype='datetime64[s]', freq=None)
In [21]: try:
....: pd.to_datetime([np.datetime64(1901901901901, "s")])
....: except Exception as err:
....: print("Raises:", err)
....:
Raises: Out of bounds nanosecond timestamp: 62238-11-15T11:51:41, at position 0
In [22]: try:
....: pd.to_datetime(1901901901901, unit="s")
....: except Exception as err:
....: print("Raises:", err)
....:
Raises: Out of bounds nanosecond timestamp: 62238-11-15 11:51:41
In [23]: try:
....: pd.to_datetime([1901901901901], unit="s")
....: except Exception as err:
....: print("Raises:", err)
....:
Raises: cannot convert input 1901901901901 with the unit 's', at position 0
In [24]: try:
....: pd.to_datetime(np.array([1901901901901]), unit="s")
....: except Exception as err:
....: print("Raises:", err)
....:
Raises: Out of bounds nanosecond timestamp: 62238-11-15 11:51:41
to_timedelta#
The function pandas.to_timedelta()
is used within xarray for inferring units and for testing purposes.
In normal operation pandas.to_timedelta()
returns a pandas.Timedelta
(for scalar input) or pandas.TimedeltaIndex
(for array-like input) which are np.timedelta64
values with ns
resolution internally. That has the implication, that the usable timedelta covers only roughly 585 years. To accommodate for that, we are working around that limitation in the encoding and decoding step.
In [25]: f"Maximum timedelta range: ({pd.to_timedelta(int64_min, unit="ns")}, {pd.to_timedelta(int64_max, unit="ns")})"
Out[25]: 'Maximum timedelta range: (-106752 days +00:12:43.145224193, 106751 days 23:47:16.854775807)'
For input values which can’t be represented in nanosecond resolution an pandas.OutOfBoundsTimedelta
exception is raised:
In [26]: try:
....: delta = pd.to_timedelta(int64_max, unit="us")
....: except Exception as err:
....: print("First:", err)
....:
First: Cannot cast 9223372036854775807 from us to 'ns' without overflow.
In [27]: try:
....: delta = pd.to_timedelta(uint64_max, unit="ns")
....: except Exception as err:
....: print("Second:", err)
....:
Second: Cannot cast 18446744073709551615 from ns to 'ns' without overflow.
When arguments are numeric (not strings or np.timedelta64
values) “unit” can be anything from 'W'
, 'D'
, 'h'
, 'm'
, 's'
, 'ms'
, 'us'
or 'ns'
, though the returned resolution will be "ns"
.
np.timedelta64
values can be extracted with pandas.Timedelta.to_numpy()
and pandas.TimedeltaIndex.to_numpy()
. The returned resolution depends on the internal representation. This representation can be changed using pandas.Timedelta.as_unit()
and pandas.TimedeltaIndex.as_unit()
respectively.
as_unit
takes one of 's'
, 'ms'
, 'us'
, 'ns'
as an argument. That means we are able to represent timedeltas with second, millisecond, microsecond or nanosecond resolution.
In [28]: delta = pd.to_timedelta(np.timedelta64(1, "D"))
In [29]: print("Timedelta:", delta, np.asarray([delta.to_numpy()]).dtype)
Timedelta: 1 days 00:00:00 timedelta64[s]
In [30]: print("Timedelta as_unit('ms'):", delta.as_unit("ms"))
Timedelta as_unit('ms'): 1 days 00:00:00
In [31]: print("Timedelta to_numpy():", delta.as_unit("ms").to_numpy())
Timedelta to_numpy(): 86400000 milliseconds
In [32]: delta = pd.to_timedelta([0, 1, 2], unit="D")
In [33]: print("TimedeltaIndex:", delta)
TimedeltaIndex: TimedeltaIndex(['0 days', '1 days', '2 days'], dtype='timedelta64[ns]', freq=None)
In [34]: print("TimedeltaIndex as_unit('ms'):", delta.as_unit("ms"))
TimedeltaIndex as_unit('ms'): TimedeltaIndex(['0 days', '1 days', '2 days'], dtype='timedelta64[ms]', freq=None)
In [35]: print("TimedeltaIndex to_numpy():", delta.as_unit("ms").to_numpy())
TimedeltaIndex to_numpy(): [ 0 86400000 172800000]
Warning
Care has to be taken, as some configurations of input data will raise. The following shows, that we are safe to use pandas.to_timedelta()
when providing numpy.timedelta64
as scalar or numpy array as input.
In [36]: print(
....: "Works:",
....: np.timedelta64(1901901901901, "s"),
....: pd.to_timedelta(np.timedelta64(1901901901901, "s")),
....: )
....:
Works: 1901901901901 seconds 22012753 days 11:51:41
In [37]: print(
....: "Works:",
....: np.array([np.timedelta64(1901901901901, "s")]),
....: pd.to_timedelta(np.array([np.timedelta64(1901901901901, "s")])),
....: )
....:
Works: [1901901901901] TimedeltaIndex(['22012753 days 11:51:41'], dtype='timedelta64[s]', freq=None)
In [38]: try:
....: pd.to_timedelta([np.timedelta64(1901901901901, "s")])
....: except Exception as err:
....: print("Raises:", err)
....:
Raises: 1901901901901 seconds
In [39]: try:
....: pd.to_timedelta(1901901901901, unit="s")
....: except Exception as err:
....: print("Raises:", err)
....:
Raises: Cannot cast 1901901901901 from s to 'ns' without overflow.
In [40]: try:
....: pd.to_timedelta([1901901901901], unit="s")
....: except Exception as err:
....: print("Raises:", err)
....:
Raises: Cannot cast 1901901901901 from s to 'ns' without overflow.
In [41]: try:
....: pd.to_timedelta(np.array([1901901901901]), unit="s")
....: except Exception as err:
....: print("Raises:", err)
....:
Raises: Cannot convert 1901901901901 seconds to timedelta64[ns] without overflow
Timestamp#
pandas.Timestamp
is used within xarray to wrap strings of CF encoding reference times and datetime.datetime.
When arguments are numeric (not strings) “unit” can be anything from 'Y'
, 'W'
, 'D'
, 'h'
, 'm'
, 's'
, 'ms'
, 'us'
or 'ns'
, though the returned resolution will be "ns"
.
In normal operation pandas.Timestamp
holds the timestamp in the provided resolution, but only one of 's'
, 'ms'
, 'us'
, 'ns'
. Lower resolution input is automatically converted to 's'
, higher resolution input is cutted to 'ns'
.
The same conversion rules apply here as for pandas.to_timedelta()
(see to_timedelta).
Depending on the internal resolution Timestamps can be represented in the range:
In [42]: for unit in ["s", "ms", "us", "ns"]:
....: print(
....: f"unit: {unit!r} time range ({pd.Timestamp(int64_min, unit=unit)}, {pd.Timestamp(int64_max, unit=unit)})"
....: )
....:
unit: 's' time range (-292277022657-01-27 08:29:53, 292277026596-12-04 15:30:07)
unit: 'ms' time range (-292275055-05-16 16:47:04.193000, 292278994-08-17 07:12:55.807000)
unit: 'us' time range (-290308-12-21 19:59:05.224193, 294247-01-10 04:00:54.775807)
unit: 'ns' time range (1677-09-21 00:12:43.145224193, 2262-04-11 23:47:16.854775807)
Since relaxing the resolution, this enhances the range to several hundreds of thousands of centuries with microsecond representation. NaT
will be at np.iinfo("int64").min
for all of the different representations.
Warning
When initialized with a datetime string this is only defined from -9999-01-01
to 9999-12-31
.
In [43]: try:
....: print("Works:", pd.Timestamp("-9999-01-01 00:00:00"))
....: print("Works, too:", pd.Timestamp("9999-12-31 23:59:59"))
....: print(pd.Timestamp("10000-01-01 00:00:00"))
....: except Exception as err:
....: print("Errors:", err)
....:
Works: -9999-01-01 00:00:00
Works, too: 9999-12-31 23:59:59
Errors: year 10000 is out of range: 10000-01-01 00:00:00
Note
pandas.Timestamp
is the only current possibility to correctly import time reference strings. It handles non-ISO formatted strings, keeps the resolution of the strings ('s'
, 'ms'
etc.) and imports time zones. When initialized with numpy.datetime64
instead of a string it even overcomes the above limitation of the possible time range.
In [44]: try:
....: print("Handles non-ISO:", pd.Timestamp("92-1-8 151542"))
....: print(
....: "Keeps resolution 1:",
....: pd.Timestamp("1992-10-08 15:15:42"),
....: pd.Timestamp("1992-10-08 15:15:42").unit,
....: )
....: print(
....: "Keeps resolution 2:",
....: pd.Timestamp("1992-10-08 15:15:42.5"),
....: pd.Timestamp("1992-10-08 15:15:42.5").unit,
....: )
....: print(
....: "Keeps timezone:",
....: pd.Timestamp("1992-10-08 15:15:42.5 -6:00"),
....: pd.Timestamp("1992-10-08 15:15:42.5 -6:00").unit,
....: )
....: print(
....: "Extends timerange :",
....: pd.Timestamp(np.datetime64("-10000-10-08 15:15:42.5001")),
....: pd.Timestamp(np.datetime64("-10000-10-08 15:15:42.5001")).unit,
....: )
....: except Exception as err:
....: print("Errors:", err)
....:
Handles non-ISO: 1992-01-08 15:15:42
Keeps resolution 1: 1992-10-08 15:15:42 s
Keeps resolution 2: 1992-10-08 15:15:42.500000 ms
Keeps timezone: 1992-10-08 15:15:42.500000-06:00 ms
Extends timerange : -10000-10-08 15:15:42.500100 us
DatetimeIndex#
pandas.DatetimeIndex
is used to wrap np.datetime64
values or other datetime-likes when encoding. The resolution of the DatetimeIndex depends on the input, but can be only one of 's'
, 'ms'
, 'us'
, 'ns'
. Lower resolution input is automatically converted to 's'
, higher resolution input is cut to 'ns'
.
pandas.DatetimeIndex
will raise pandas.OutOfBoundsDatetime
if the input can’t be represented in the given resolution.
In [45]: try:
....: print(
....: "Works:",
....: pd.DatetimeIndex(
....: np.array(["1992-01-08", "1992-01-09"], dtype="datetime64[D]")
....: ),
....: )
....: print(
....: "Works:",
....: pd.DatetimeIndex(
....: np.array(
....: ["1992-01-08 15:15:42", "1992-01-09 15:15:42"],
....: dtype="datetime64[s]",
....: )
....: ),
....: )
....: print(
....: "Works:",
....: pd.DatetimeIndex(
....: np.array(
....: ["1992-01-08 15:15:42.5", "1992-01-09 15:15:42.0"],
....: dtype="datetime64[ms]",
....: )
....: ),
....: )
....: print(
....: "Works:",
....: pd.DatetimeIndex(
....: np.array(
....: ["1970-01-01 00:00:00.401501601701801901", "1970-01-01 00:00:00"],
....: dtype="datetime64[as]",
....: )
....: ),
....: )
....: print(
....: "Works:",
....: pd.DatetimeIndex(
....: np.array(
....: ["-10000-01-01 00:00:00.401501", "1970-01-01 00:00:00"],
....: dtype="datetime64[us]",
....: )
....: ),
....: )
....: except Exception as err:
....: print("Errors:", err)
....:
Works: DatetimeIndex(['1992-01-08', '1992-01-09'], dtype='datetime64[s]', freq=None)
Works: DatetimeIndex(['1992-01-08 15:15:42', '1992-01-09 15:15:42'], dtype='datetime64[s]', freq=None)
Works: DatetimeIndex(['1992-01-08 15:15:42.500000', '1992-01-09 15:15:42'], dtype='datetime64[ms]', freq=None)
Works: DatetimeIndex(['1970-01-01 00:00:00.401501601', '1970-01-01 00:00:00'], dtype='datetime64[ns]', freq=None)
Works: DatetimeIndex(['-10000-01-01 00:00:00.401501', '1970-01-01 00:00:00'], dtype='datetime64[us]', freq=None)
CF Conventions Time Handling#
Xarray tries to adhere to the latest version of the CF Conventions. Relevant is the section on Time Coordinate and the Calendar subsection.
CF time decoding#
Decoding of values
with a time unit specification like "seconds since 1992-10-8 15:15:42.5 -6:00"
into datetimes using the CF conventions is a multistage process.
If we have a non-standard calendar (e.g.
"noleap"
) decoding is done with thecftime
package, which is not covered in this section. For the"standard"
/"gregorian"
calendar as well as the"proleptic_gregorian"
calendar the above outlined pandas functionality is used.The
"standard"
/"gregorian"
calendar and the"proleptic_gregorian"
are equivalent for any dates and reference times >="1582-10-15"
. First the reference time is checked and any timezone information stripped off. In a second step, the minimum and maximumvalues
are checked if they can be represented in the current reference time resolution. At the same time integer overflow would be caught. For the"standard"
/"gregorian"
calendar the dates are checked to be >="1582-10-15"
. If anything fails, the decoding is attempted withcftime
.As the unit (here
"seconds"
) and the resolution of the reference time"1992-10-8 15:15:42.5 -6:00"
(here"milliseconds"
) might be different, the decoding resolution is aligned to the higher resolution of the two. Users may also specify their wanted target resolution by setting thetime_unit
keyword argument to one of's'
,'ms'
,'us'
,'ns'
(default'ns'
). This will be included in the alignment process. This is done by multiplying thevalues
by the ratio of nanoseconds per time unit and nanoseconds per reference time unit. To retain consistency forNaT
values a mask is kept and re-introduced after the multiplication.Times encoded as floating point values are checked for fractional parts and the resolution is enhanced in an iterative process until a fitting resolution (or
'ns'
) is found. ASerializationWarning
is issued to make the user aware of the possibly problematic encoding.Finally, the
values
(at this point converted toint64
values) are cast todatetime64[unit]
(using the above retrieved unit) and added to the reference timepandas.Timestamp
.
In [46]: calendar = "proleptic_gregorian"
In [47]: values = np.array([-1000 * 365, 0, 1000 * 365], dtype="int64")
In [48]: units = "days since 2000-01-01 00:00:00.000001"
In [49]: dt = xr.coding.times.decode_cf_datetime(values, units, calendar, time_unit="s")
In [50]: assert dt.dtype == "datetime64[us]"
In [51]: dt
Out[51]:
array(['1000-08-31T00:00:00.000001', '2000-01-01T00:00:00.000001',
'2999-05-03T00:00:00.000001'], dtype='datetime64[us]')
In [52]: units = "microseconds since 2000-01-01 00:00:00"
In [53]: dt = xr.coding.times.decode_cf_datetime(values, units, calendar, time_unit="s")
In [54]: assert dt.dtype == "datetime64[us]"
In [55]: dt
Out[55]:
array(['1999-12-31T23:59:59.635000', '2000-01-01T00:00:00.000000',
'2000-01-01T00:00:00.365000'], dtype='datetime64[us]')
In [56]: values = np.array([0, 0.25, 0.5, 0.75, 1.0], dtype="float64")
In [57]: units = "days since 2000-01-01 00:00:00.001"
In [58]: dt = xr.coding.times.decode_cf_datetime(values, units, calendar, time_unit="s")
In [59]: assert dt.dtype == "datetime64[ms]"
In [60]: dt
Out[60]:
array(['2000-01-01T00:00:00.001', '2000-01-01T06:00:00.001',
'2000-01-01T12:00:00.001', '2000-01-01T18:00:00.001',
'2000-01-02T00:00:00.001'], dtype='datetime64[ms]')
In [61]: values = np.array([0, 0.25, 0.5, 0.75, 1.0], dtype="float64")
In [62]: units = "hours since 2000-01-01"
In [63]: dt = xr.coding.times.decode_cf_datetime(values, units, calendar, time_unit="s")
In [64]: assert dt.dtype == "datetime64[s]"
In [65]: dt
Out[65]:
array(['2000-01-01T00:00:00', '2000-01-01T00:15:00',
'2000-01-01T00:30:00', '2000-01-01T00:45:00',
'2000-01-01T01:00:00'], dtype='datetime64[s]')
In [66]: values = np.array([0, 0.25, 0.5, 0.75, 1.0], dtype="float64")
In [67]: units = "hours since 2000-01-01 00:00:00 03:30"
In [68]: dt = xr.coding.times.decode_cf_datetime(values, units, calendar, time_unit="s")
In [69]: assert dt.dtype == "datetime64[s]"
In [70]: dt
Out[70]:
array(['2000-01-01T03:30:00', '2000-01-01T03:45:00',
'2000-01-01T04:00:00', '2000-01-01T04:15:00',
'2000-01-01T04:30:00'], dtype='datetime64[s]')
In [71]: values = np.array([-2002 * 365 - 121, -366, 365, 2000 * 365 + 119], dtype="int64")
In [72]: units = "days since 0001-01-01 00:00:00"
In [73]: dt = xr.coding.times.decode_cf_datetime(values, units, calendar, time_unit="s")
In [74]: assert dt.dtype == "datetime64[s]"
In [75]: dt
Out[75]:
array(['-2000-01-01T00:00:00', '0000-01-01T00:00:00',
'0002-01-01T00:00:00', '2000-01-01T00:00:00'],
dtype='datetime64[s]')
CF time encoding#
For encoding the process is more or less a reversal of the above, but we have to make some decisions on default values.
Infer
data_units
from the givendates
.Infer
units
(either cleanup givenunits
or usedata_units
Infer the calendar name from the given
dates
.If dates are
cftime.datetime
objects then encode withcftime.date2num
Retrieve
time_units
andref_date
fromunits
Check
ref_date
>=1582-10-15
, otherwise ->cftime
Wrap
dates
with pd.DatetimeIndexSubtracting
ref_date
(pandas.Timestamp
) from abovepandas.DatetimeIndex
will returnpandas.TimedeltaIndex
Align resolution of
pandas.TimedeltaIndex
with resolution oftime_units
Retrieve needed
units
anddelta
to faithfully encode into int64Divide
time_deltas
bydelta
, use floor division (integer) or normal division (float)Return result
In [76]: calendar = "proleptic_gregorian"
In [77]: dates = np.array(
....: [
....: "-2000-01-01T00:00:00",
....: "0000-01-01T00:00:00",
....: "0002-01-01T00:00:00",
....: "2000-01-01T00:00:00",
....: ],
....: dtype="datetime64[s]",
....: )
....:
In [78]: orig_values = np.array(
....: [-2002 * 365 - 121, -366, 365, 2000 * 365 + 119], dtype="int64"
....: )
....:
In [79]: units = "days since 0001-01-01 00:00:00"
In [80]: values, _, _ = xr.coding.times.encode_cf_datetime(
....: dates, units, calendar, dtype=np.dtype("int64")
....: )
....:
In [81]: print(values)
[-730851 -366 365 730119]
In [82]: np.testing.assert_array_equal(values, orig_values)
In [83]: dates = np.array(
....: [
....: "-2000-01-01T01:00:00",
....: "0000-01-01T00:00:00",
....: "0002-01-01T00:00:00",
....: "2000-01-01T00:00:00",
....: ],
....: dtype="datetime64[s]",
....: )
....:
In [84]: orig_values = np.array(
....: [-2002 * 365 - 121, -366, 365, 2000 * 365 + 119], dtype="int64"
....: )
....:
In [85]: units = "days since 0001-01-01 00:00:00"
In [86]: values, units, _ = xr.coding.times.encode_cf_datetime(
....: dates, units, calendar, dtype=np.dtype("int64")
....: )
....:
In [87]: print(values, units)
[-17540423 -8784 8760 17522856] hours since 0001-01-01
Default Time Unit#
The current default time unit of xarray is 'ns'
. When setting keyword argument time_unit
unit to 's'
(the lowest resolution pandas allows) datetimes will be converted to at least 's'
-resolution, if possible. The same holds true for 'ms'
and 'us'
.
In [88]: attrs = {"units": "hours since 2000-01-01"}
In [89]: ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)})
In [90]: ds.to_netcdf("test-datetimes1.nc")
In [91]: xr.open_dataset("test-datetimes1.nc")
Out[91]:
<xarray.Dataset> Size: 32B
Dimensions: (time: 4)
Coordinates:
* time (time) datetime64[ns] 32B 2000-01-01 ... 2000-01-01T03:00:00
Data variables:
*empty*
In [92]: coder = xr.coders.CFDatetimeCoder(time_unit="s")
In [93]: xr.open_dataset("test-datetimes1.nc", decode_times=coder)
Out[93]:
<xarray.Dataset> Size: 32B
Dimensions: (time: 4)
Coordinates:
* time (time) datetime64[s] 32B 2000-01-01 ... 2000-01-01T03:00:00
Data variables:
*empty*
If a coarser unit is requested the datetimes are decoded into their native on-disk resolution, if possible.
In [94]: attrs = {"units": "milliseconds since 2000-01-01"}
In [95]: ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)})
In [96]: ds.to_netcdf("test-datetimes2.nc")
In [97]: xr.open_dataset("test-datetimes2.nc")
Out[97]:
<xarray.Dataset> Size: 32B
Dimensions: (time: 4)
Coordinates:
* time (time) datetime64[ns] 32B 2000-01-01 ... 2000-01-01T00:00:00.003000
Data variables:
*empty*
In [98]: coder = xr.coders.CFDatetimeCoder(time_unit="s")
In [99]: xr.open_dataset("test-datetimes2.nc", decode_times=coder)
Out[99]:
<xarray.Dataset> Size: 32B
Dimensions: (time: 4)
Coordinates:
* time (time) datetime64[ms] 32B 2000-01-01 ... 2000-01-01T00:00:00.003000
Data variables:
*empty*
Similar logic applies for decoding timedelta values. The default resolution is
"ns"
:
In [100]: attrs = {"units": "hours"}
In [101]: ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)})
In [102]: ds.to_netcdf("test-timedeltas1.nc")
In [103]: xr.open_dataset("test-timedeltas1.nc")
Out[103]:
<xarray.Dataset> Size: 32B
Dimensions: (time: 4)
Coordinates:
* time (time) timedelta64[ns] 32B 00:00:00 01:00:00 02:00:00 03:00:00
Data variables:
*empty*
By default, timedeltas will be decoded to the same resolution as datetimes:
In [104]: coder = xr.coders.CFDatetimeCoder(time_unit="s")
In [105]: xr.open_dataset("test-timedeltas1.nc", decode_times=coder)
Out[105]:
<xarray.Dataset> Size: 32B
Dimensions: (time: 4)
Coordinates:
* time (time) timedelta64[s] 32B 00:00:00 01:00:00 02:00:00 03:00:00
Data variables:
*empty*
but if one would like to decode timedeltas to a different resolution, one can
provide a coder specifically for timedeltas to decode_timedelta
:
In [106]: timedelta_coder = xr.coders.CFTimedeltaCoder(time_unit="ms")
In [107]: xr.open_dataset(
.....: "test-timedeltas1.nc", decode_times=coder, decode_timedelta=timedelta_coder
.....: )
.....:
Out[107]:
<xarray.Dataset> Size: 32B
Dimensions: (time: 4)
Coordinates:
* time (time) timedelta64[ms] 32B 00:00:00 01:00:00 02:00:00 03:00:00
Data variables:
*empty*
As with datetimes, if a coarser unit is requested the timedeltas are decoded into their native on-disk resolution, if possible:
In [108]: attrs = {"units": "milliseconds"}
In [109]: ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)})
In [110]: ds.to_netcdf("test-timedeltas2.nc")
In [111]: xr.open_dataset("test-timedeltas2.nc")
Out[111]:
<xarray.Dataset> Size: 32B
Dimensions: (time: 4)
Coordinates:
* time (time) timedelta64[ns] 32B 00:00:00 ... 00:00:00.003000
Data variables:
*empty*
In [112]: coder = xr.coders.CFDatetimeCoder(time_unit="s")
In [113]: xr.open_dataset("test-timedeltas2.nc", decode_times=coder)
Out[113]:
<xarray.Dataset> Size: 32B
Dimensions: (time: 4)
Coordinates:
* time (time) timedelta64[s] 32B 00:00:00 00:00:00 00:00:00 00:00:00
Data variables:
*empty*
To opt-out of timedelta decoding (see issue Undesired decoding to timedelta64) pass False
to decode_timedelta
:
In [114]: xr.open_dataset("test-timedeltas2.nc", decode_timedelta=False)
Out[114]:
<xarray.Dataset> Size: 32B
Dimensions: (time: 4)
Coordinates:
* time (time) int64 32B 0 1 2 3
Data variables:
*empty*
Note
Note that in the future the default value of decode_timedelta
will be
False
rather than None
.