Time Coding#

This page gives an overview how xarray encodes and decodes times and which conventions and functions are used.

Pandas functionality#

to_datetime#

The function pandas.to_datetime() is used within xarray for inferring units and for testing purposes.

In normal operation pandas.to_datetime() returns a pandas.Timestamp (for scalar input) or pandas.DatetimeIndex (for array-like input) which are related to np.datetime64 values with a resolution inherited from the input (can be one of 's', 'ms', 'us', 'ns'). If no resolution can be inherited 'ns' is assumed. That has the implication that the maximum usable time range for those cases is approximately +/- 292 years centered around the Unix epoch (1970-01-01). To accommodate that, we carefully check the units/resolution in the encoding and decoding step.

When the arguments are numeric (not strings or np.datetime64 values) "unit" can be anything from 'Y', 'W', 'D', 'h', 'm', 's', 'ms', 'us' or 'ns', though the returned resolution will be "ns".

In [1]: f"Minimum datetime: {pd.to_datetime(int64_min, unit="ns")}"
Out[1]: 'Minimum datetime: 1677-09-21 00:12:43.145224193'

In [2]: f"Maximum datetime: {pd.to_datetime(int64_max, unit="ns")}"
Out[2]: 'Maximum datetime: 2262-04-11 23:47:16.854775807'

For input values which can’t be represented in nanosecond resolution an pandas.OutOfBoundsDatetime exception is raised:

In [3]: try:
   ...:     dtime = pd.to_datetime(int64_max, unit="us")
   ...: except Exception as err:
   ...:     print(err)
   ...: 
Out of bounds nanosecond timestamp: 294247-01-10 04:00:54

In [4]: try:
   ...:     dtime = pd.to_datetime(uint64_max, unit="ns")
   ...:     print("Wrong:", dtime)
   ...:     dtime = pd.to_datetime([uint64_max], unit="ns")
   ...: except Exception as err:
   ...:     print(err)
   ...: 
Wrong: 1969-12-31 23:59:59.999999999
cannot convert input 18446744073709551615 with the unit 'ns', at position 0

np.datetime64 values can be extracted with pandas.Timestamp.to_numpy() and pandas.DatetimeIndex.to_numpy(). The returned resolution depends on the internal representation. This representation can be changed using pandas.Timestamp.as_unit() and pandas.DatetimeIndex.as_unit() respectively.

as_unit takes one of 's', 'ms', 'us', 'ns' as an argument. That means we are able to represent datetimes with second, millisecond, microsecond or nanosecond resolution.

In [5]: time = pd.to_datetime(np.datetime64(0, "D"))

In [6]: print("Datetime:", time, np.asarray([time.to_numpy()]).dtype)
Datetime: 1970-01-01 00:00:00 datetime64[s]

In [7]: print("Datetime as_unit('ms'):", time.as_unit("ms"))
Datetime as_unit('ms'): 1970-01-01 00:00:00

In [8]: print("Datetime to_numpy():", time.as_unit("ms").to_numpy())
Datetime to_numpy(): 1970-01-01T00:00:00.000

In [9]: time = pd.to_datetime(np.array([-1000, 1, 2], dtype="datetime64[Y]"))

In [10]: print("DatetimeIndex:", time)
DatetimeIndex: DatetimeIndex(['970-01-01', '1971-01-01', '1972-01-01'], dtype='datetime64[s]', freq=None)

In [11]: print("DatetimeIndex as_unit('us'):", time.as_unit("us"))
DatetimeIndex as_unit('us'): DatetimeIndex(['970-01-01', '1971-01-01', '1972-01-01'], dtype='datetime64[us]', freq=None)

In [12]: print("DatetimeIndex to_numpy():", time.as_unit("us").to_numpy())
DatetimeIndex to_numpy(): ['0970-01-01T00:00:00.000000' '1971-01-01T00:00:00.000000'
 '1972-01-01T00:00:00.000000']

Warning

Input data with resolution higher than 'ns' (eg. 'ps', 'fs', 'as') is truncated (not rounded) at the 'ns'-level. This is currently broken for the 'ps' input, where it is interpreted as 'ns'.

In [13]: print("Good:", pd.to_datetime([np.datetime64(1901901901901, "as")]))
Good: DatetimeIndex(['1970-01-01 00:00:00.000001901'], dtype='datetime64[ns]', freq=None)

In [14]: print("Good:", pd.to_datetime([np.datetime64(1901901901901, "fs")]))
Good: DatetimeIndex(['1970-01-01 00:00:00.001901901'], dtype='datetime64[ns]', freq=None)

In [15]: print(" Bad:", pd.to_datetime([np.datetime64(1901901901901, "ps")]))
 Bad: DatetimeIndex(['1970-01-01 00:31:41.901901901'], dtype='datetime64[ns]', freq=None)

In [16]: print("Good:", pd.to_datetime([np.datetime64(1901901901901, "ns")]))
Good: DatetimeIndex(['1970-01-01 00:31:41.901901901'], dtype='datetime64[ns]', freq=None)

In [17]: print("Good:", pd.to_datetime([np.datetime64(1901901901901, "us")]))
Good: DatetimeIndex(['1970-01-23 00:18:21.901901'], dtype='datetime64[ns]', freq=None)

In [18]: print("Good:", pd.to_datetime([np.datetime64(1901901901901, "ms")]))
Good: DatetimeIndex(['2030-04-08 18:05:01.901000'], dtype='datetime64[ns]', freq=None)

Warning

Care has to be taken, as some configurations of input data will raise. The following shows, that we are safe to use pandas.to_datetime() when providing numpy.datetime64 as scalar or numpy array as input.

In [19]: print(
   ....:     "Works:",
   ....:     np.datetime64(1901901901901, "s"),
   ....:     pd.to_datetime(np.datetime64(1901901901901, "s")),
   ....: )
   ....: 
Works: 62238-11-15T11:51:41 62238-11-15 11:51:41

In [20]: print(
   ....:     "Works:",
   ....:     np.array([np.datetime64(1901901901901, "s")]),
   ....:     pd.to_datetime(np.array([np.datetime64(1901901901901, "s")])),
   ....: )
   ....: 
Works: ['62238-11-15T11:51:41'] DatetimeIndex(['62238-11-15 11:51:41'], dtype='datetime64[s]', freq=None)

In [21]: try:
   ....:     pd.to_datetime([np.datetime64(1901901901901, "s")])
   ....: except Exception as err:
   ....:     print("Raises:", err)
   ....: 
Raises: Out of bounds nanosecond timestamp: 62238-11-15T11:51:41, at position 0

In [22]: try:
   ....:     pd.to_datetime(1901901901901, unit="s")
   ....: except Exception as err:
   ....:     print("Raises:", err)
   ....: 
Raises: Out of bounds nanosecond timestamp: 62238-11-15 11:51:41

In [23]: try:
   ....:     pd.to_datetime([1901901901901], unit="s")
   ....: except Exception as err:
   ....:     print("Raises:", err)
   ....: 
Raises: cannot convert input 1901901901901 with the unit 's', at position 0

In [24]: try:
   ....:     pd.to_datetime(np.array([1901901901901]), unit="s")
   ....: except Exception as err:
   ....:     print("Raises:", err)
   ....: 
Raises: Out of bounds nanosecond timestamp: 62238-11-15 11:51:41

to_timedelta#

The function pandas.to_timedelta() is used within xarray for inferring units and for testing purposes.

In normal operation pandas.to_timedelta() returns a pandas.Timedelta (for scalar input) or pandas.TimedeltaIndex (for array-like input) which are np.timedelta64 values with ns resolution internally. That has the implication, that the usable timedelta covers only roughly 585 years. To accommodate for that, we are working around that limitation in the encoding and decoding step.

In [25]: f"Maximum timedelta range: ({pd.to_timedelta(int64_min, unit="ns")}, {pd.to_timedelta(int64_max, unit="ns")})"
Out[25]: 'Maximum timedelta range: (-106752 days +00:12:43.145224193, 106751 days 23:47:16.854775807)'

For input values which can’t be represented in nanosecond resolution an pandas.OutOfBoundsTimedelta exception is raised:

In [26]: try:
   ....:     delta = pd.to_timedelta(int64_max, unit="us")
   ....: except Exception as err:
   ....:     print("First:", err)
   ....: 
First: Cannot cast 9223372036854775807 from us to 'ns' without overflow.

In [27]: try:
   ....:     delta = pd.to_timedelta(uint64_max, unit="ns")
   ....: except Exception as err:
   ....:     print("Second:", err)
   ....: 
Second: Cannot cast 18446744073709551615 from ns to 'ns' without overflow.

When arguments are numeric (not strings or np.timedelta64 values) “unit” can be anything from 'W', 'D', 'h', 'm', 's', 'ms', 'us' or 'ns', though the returned resolution will be "ns".

np.timedelta64 values can be extracted with pandas.Timedelta.to_numpy() and pandas.TimedeltaIndex.to_numpy(). The returned resolution depends on the internal representation. This representation can be changed using pandas.Timedelta.as_unit() and pandas.TimedeltaIndex.as_unit() respectively.

as_unit takes one of 's', 'ms', 'us', 'ns' as an argument. That means we are able to represent timedeltas with second, millisecond, microsecond or nanosecond resolution.

In [28]: delta = pd.to_timedelta(np.timedelta64(1, "D"))

In [29]: print("Timedelta:", delta, np.asarray([delta.to_numpy()]).dtype)
Timedelta: 1 days 00:00:00 timedelta64[s]

In [30]: print("Timedelta as_unit('ms'):", delta.as_unit("ms"))
Timedelta as_unit('ms'): 1 days 00:00:00

In [31]: print("Timedelta to_numpy():", delta.as_unit("ms").to_numpy())
Timedelta to_numpy(): 86400000 milliseconds

In [32]: delta = pd.to_timedelta([0, 1, 2], unit="D")

In [33]: print("TimedeltaIndex:", delta)
TimedeltaIndex: TimedeltaIndex(['0 days', '1 days', '2 days'], dtype='timedelta64[ns]', freq=None)

In [34]: print("TimedeltaIndex as_unit('ms'):", delta.as_unit("ms"))
TimedeltaIndex as_unit('ms'): TimedeltaIndex(['0 days', '1 days', '2 days'], dtype='timedelta64[ms]', freq=None)

In [35]: print("TimedeltaIndex to_numpy():", delta.as_unit("ms").to_numpy())
TimedeltaIndex to_numpy(): [        0  86400000 172800000]

Warning

Care has to be taken, as some configurations of input data will raise. The following shows, that we are safe to use pandas.to_timedelta() when providing numpy.timedelta64 as scalar or numpy array as input.

In [36]: print(
   ....:     "Works:",
   ....:     np.timedelta64(1901901901901, "s"),
   ....:     pd.to_timedelta(np.timedelta64(1901901901901, "s")),
   ....: )
   ....: 
Works: 1901901901901 seconds 22012753 days 11:51:41

In [37]: print(
   ....:     "Works:",
   ....:     np.array([np.timedelta64(1901901901901, "s")]),
   ....:     pd.to_timedelta(np.array([np.timedelta64(1901901901901, "s")])),
   ....: )
   ....: 
Works: [1901901901901] TimedeltaIndex(['22012753 days 11:51:41'], dtype='timedelta64[s]', freq=None)

In [38]: try:
   ....:     pd.to_timedelta([np.timedelta64(1901901901901, "s")])
   ....: except Exception as err:
   ....:     print("Raises:", err)
   ....: 
Raises: 1901901901901 seconds

In [39]: try:
   ....:     pd.to_timedelta(1901901901901, unit="s")
   ....: except Exception as err:
   ....:     print("Raises:", err)
   ....: 
Raises: Cannot cast 1901901901901 from s to 'ns' without overflow.

In [40]: try:
   ....:     pd.to_timedelta([1901901901901], unit="s")
   ....: except Exception as err:
   ....:     print("Raises:", err)
   ....: 
Raises: Cannot cast 1901901901901 from s to 'ns' without overflow.

In [41]: try:
   ....:     pd.to_timedelta(np.array([1901901901901]), unit="s")
   ....: except Exception as err:
   ....:     print("Raises:", err)
   ....: 
Raises: Cannot convert 1901901901901 seconds to timedelta64[ns] without overflow

Timestamp#

pandas.Timestamp is used within xarray to wrap strings of CF encoding reference times and datetime.datetime.

When arguments are numeric (not strings) “unit” can be anything from 'Y', 'W', 'D', 'h', 'm', 's', 'ms', 'us' or 'ns', though the returned resolution will be "ns".

In normal operation pandas.Timestamp holds the timestamp in the provided resolution, but only one of 's', 'ms', 'us', 'ns'. Lower resolution input is automatically converted to 's', higher resolution input is cutted to 'ns'.

The same conversion rules apply here as for pandas.to_timedelta() (see to_timedelta). Depending on the internal resolution Timestamps can be represented in the range:

In [42]: for unit in ["s", "ms", "us", "ns"]:
   ....:     print(
   ....:         f"unit: {unit!r} time range ({pd.Timestamp(int64_min, unit=unit)}, {pd.Timestamp(int64_max, unit=unit)})"
   ....:     )
   ....: 
unit: 's' time range (-292277022657-01-27 08:29:53, 292277026596-12-04 15:30:07)
unit: 'ms' time range (-292275055-05-16 16:47:04.193000, 292278994-08-17 07:12:55.807000)
unit: 'us' time range (-290308-12-21 19:59:05.224193, 294247-01-10 04:00:54.775807)
unit: 'ns' time range (1677-09-21 00:12:43.145224193, 2262-04-11 23:47:16.854775807)

Since relaxing the resolution, this enhances the range to several hundreds of thousands of centuries with microsecond representation. NaT will be at np.iinfo("int64").min for all of the different representations.

Warning

When initialized with a datetime string this is only defined from -9999-01-01 to 9999-12-31.

In [43]: try:
   ....:     print("Works:", pd.Timestamp("-9999-01-01 00:00:00"))
   ....:     print("Works, too:", pd.Timestamp("9999-12-31 23:59:59"))
   ....:     print(pd.Timestamp("10000-01-01 00:00:00"))
   ....: except Exception as err:
   ....:     print("Errors:", err)
   ....: 
Works: -9999-01-01 00:00:00
Works, too: 9999-12-31 23:59:59
Errors: year 10000 is out of range: 10000-01-01 00:00:00

Note

pandas.Timestamp is the only current possibility to correctly import time reference strings. It handles non-ISO formatted strings, keeps the resolution of the strings ('s', 'ms' etc.) and imports time zones. When initialized with numpy.datetime64 instead of a string it even overcomes the above limitation of the possible time range.

In [44]: try:
   ....:     print("Handles non-ISO:", pd.Timestamp("92-1-8 151542"))
   ....:     print(
   ....:         "Keeps resolution 1:",
   ....:         pd.Timestamp("1992-10-08 15:15:42"),
   ....:         pd.Timestamp("1992-10-08 15:15:42").unit,
   ....:     )
   ....:     print(
   ....:         "Keeps resolution 2:",
   ....:         pd.Timestamp("1992-10-08 15:15:42.5"),
   ....:         pd.Timestamp("1992-10-08 15:15:42.5").unit,
   ....:     )
   ....:     print(
   ....:         "Keeps timezone:",
   ....:         pd.Timestamp("1992-10-08 15:15:42.5 -6:00"),
   ....:         pd.Timestamp("1992-10-08 15:15:42.5 -6:00").unit,
   ....:     )
   ....:     print(
   ....:         "Extends timerange :",
   ....:         pd.Timestamp(np.datetime64("-10000-10-08 15:15:42.5001")),
   ....:         pd.Timestamp(np.datetime64("-10000-10-08 15:15:42.5001")).unit,
   ....:     )
   ....: except Exception as err:
   ....:     print("Errors:", err)
   ....: 
Handles non-ISO: 1992-01-08 15:15:42
Keeps resolution 1: 1992-10-08 15:15:42 s
Keeps resolution 2: 1992-10-08 15:15:42.500000 ms
Keeps timezone: 1992-10-08 15:15:42.500000-06:00 ms
Extends timerange : -10000-10-08 15:15:42.500100 us

DatetimeIndex#

pandas.DatetimeIndex is used to wrap np.datetime64 values or other datetime-likes when encoding. The resolution of the DatetimeIndex depends on the input, but can be only one of 's', 'ms', 'us', 'ns'. Lower resolution input is automatically converted to 's', higher resolution input is cut to 'ns'. pandas.DatetimeIndex will raise pandas.OutOfBoundsDatetime if the input can’t be represented in the given resolution.

In [45]: try:
   ....:     print(
   ....:         "Works:",
   ....:         pd.DatetimeIndex(
   ....:             np.array(["1992-01-08", "1992-01-09"], dtype="datetime64[D]")
   ....:         ),
   ....:     )
   ....:     print(
   ....:         "Works:",
   ....:         pd.DatetimeIndex(
   ....:             np.array(
   ....:                 ["1992-01-08 15:15:42", "1992-01-09 15:15:42"],
   ....:                 dtype="datetime64[s]",
   ....:             )
   ....:         ),
   ....:     )
   ....:     print(
   ....:         "Works:",
   ....:         pd.DatetimeIndex(
   ....:             np.array(
   ....:                 ["1992-01-08 15:15:42.5", "1992-01-09 15:15:42.0"],
   ....:                 dtype="datetime64[ms]",
   ....:             )
   ....:         ),
   ....:     )
   ....:     print(
   ....:         "Works:",
   ....:         pd.DatetimeIndex(
   ....:             np.array(
   ....:                 ["1970-01-01 00:00:00.401501601701801901", "1970-01-01 00:00:00"],
   ....:                 dtype="datetime64[as]",
   ....:             )
   ....:         ),
   ....:     )
   ....:     print(
   ....:         "Works:",
   ....:         pd.DatetimeIndex(
   ....:             np.array(
   ....:                 ["-10000-01-01 00:00:00.401501", "1970-01-01 00:00:00"],
   ....:                 dtype="datetime64[us]",
   ....:             )
   ....:         ),
   ....:     )
   ....: except Exception as err:
   ....:     print("Errors:", err)
   ....: 
Works: DatetimeIndex(['1992-01-08', '1992-01-09'], dtype='datetime64[s]', freq=None)
Works: DatetimeIndex(['1992-01-08 15:15:42', '1992-01-09 15:15:42'], dtype='datetime64[s]', freq=None)
Works: DatetimeIndex(['1992-01-08 15:15:42.500000', '1992-01-09 15:15:42'], dtype='datetime64[ms]', freq=None)
Works: DatetimeIndex(['1970-01-01 00:00:00.401501601', '1970-01-01 00:00:00'], dtype='datetime64[ns]', freq=None)
Works: DatetimeIndex(['-10000-01-01 00:00:00.401501', '1970-01-01 00:00:00'], dtype='datetime64[us]', freq=None)

CF Conventions Time Handling#

Xarray tries to adhere to the latest version of the CF Conventions. Relevant is the section on Time Coordinate and the Calendar subsection.

CF time decoding#

Decoding of values with a time unit specification like "seconds since 1992-10-8 15:15:42.5 -6:00" into datetimes using the CF conventions is a multistage process.

  1. If we have a non-standard calendar (e.g. "noleap") decoding is done with the cftime package, which is not covered in this section. For the "standard"/"gregorian" calendar as well as the "proleptic_gregorian" calendar the above outlined pandas functionality is used.

  2. The "standard"/"gregorian" calendar and the "proleptic_gregorian" are equivalent for any dates and reference times >= "1582-10-15". First the reference time is checked and any timezone information stripped off. In a second step, the minimum and maximum values are checked if they can be represented in the current reference time resolution. At the same time integer overflow would be caught. For the "standard"/"gregorian" calendar the dates are checked to be >= "1582-10-15". If anything fails, the decoding is attempted with cftime.

  3. As the unit (here "seconds") and the resolution of the reference time "1992-10-8 15:15:42.5 -6:00" (here "milliseconds") might be different, the decoding resolution is aligned to the higher resolution of the two. Users may also specify their wanted target resolution by setting the time_unit keyword argument to one of 's', 'ms', 'us', 'ns' (default 'ns'). This will be included in the alignment process. This is done by multiplying the values by the ratio of nanoseconds per time unit and nanoseconds per reference time unit. To retain consistency for NaT values a mask is kept and re-introduced after the multiplication.

  4. Times encoded as floating point values are checked for fractional parts and the resolution is enhanced in an iterative process until a fitting resolution (or 'ns') is found. A SerializationWarning is issued to make the user aware of the possibly problematic encoding.

  5. Finally, the values (at this point converted to int64 values) are cast to datetime64[unit] (using the above retrieved unit) and added to the reference time pandas.Timestamp.

In [46]: calendar = "proleptic_gregorian"

In [47]: values = np.array([-1000 * 365, 0, 1000 * 365], dtype="int64")

In [48]: units = "days since 2000-01-01 00:00:00.000001"

In [49]: dt = xr.coding.times.decode_cf_datetime(values, units, calendar, time_unit="s")

In [50]: assert dt.dtype == "datetime64[us]"

In [51]: dt
Out[51]: 
array(['1000-08-31T00:00:00.000001', '2000-01-01T00:00:00.000001',
       '2999-05-03T00:00:00.000001'], dtype='datetime64[us]')
In [52]: units = "microseconds since 2000-01-01 00:00:00"

In [53]: dt = xr.coding.times.decode_cf_datetime(values, units, calendar, time_unit="s")

In [54]: assert dt.dtype == "datetime64[us]"

In [55]: dt
Out[55]: 
array(['1999-12-31T23:59:59.635000', '2000-01-01T00:00:00.000000',
       '2000-01-01T00:00:00.365000'], dtype='datetime64[us]')
In [56]: values = np.array([0, 0.25, 0.5, 0.75, 1.0], dtype="float64")

In [57]: units = "days since 2000-01-01 00:00:00.001"

In [58]: dt = xr.coding.times.decode_cf_datetime(values, units, calendar, time_unit="s")

In [59]: assert dt.dtype == "datetime64[ms]"

In [60]: dt
Out[60]: 
array(['2000-01-01T00:00:00.001', '2000-01-01T06:00:00.001',
       '2000-01-01T12:00:00.001', '2000-01-01T18:00:00.001',
       '2000-01-02T00:00:00.001'], dtype='datetime64[ms]')
In [61]: values = np.array([0, 0.25, 0.5, 0.75, 1.0], dtype="float64")

In [62]: units = "hours since 2000-01-01"

In [63]: dt = xr.coding.times.decode_cf_datetime(values, units, calendar, time_unit="s")

In [64]: assert dt.dtype == "datetime64[s]"

In [65]: dt
Out[65]: 
array(['2000-01-01T00:00:00', '2000-01-01T00:15:00',
       '2000-01-01T00:30:00', '2000-01-01T00:45:00',
       '2000-01-01T01:00:00'], dtype='datetime64[s]')
In [66]: values = np.array([0, 0.25, 0.5, 0.75, 1.0], dtype="float64")

In [67]: units = "hours since 2000-01-01 00:00:00 03:30"

In [68]: dt = xr.coding.times.decode_cf_datetime(values, units, calendar, time_unit="s")

In [69]: assert dt.dtype == "datetime64[s]"

In [70]: dt
Out[70]: 
array(['2000-01-01T03:30:00', '2000-01-01T03:45:00',
       '2000-01-01T04:00:00', '2000-01-01T04:15:00',
       '2000-01-01T04:30:00'], dtype='datetime64[s]')
In [71]: values = np.array([-2002 * 365 - 121, -366, 365, 2000 * 365 + 119], dtype="int64")

In [72]: units = "days since 0001-01-01 00:00:00"

In [73]: dt = xr.coding.times.decode_cf_datetime(values, units, calendar, time_unit="s")

In [74]: assert dt.dtype == "datetime64[s]"

In [75]: dt
Out[75]: 
array(['-2000-01-01T00:00:00',  '0000-01-01T00:00:00',
        '0002-01-01T00:00:00',  '2000-01-01T00:00:00'],
      dtype='datetime64[s]')

CF time encoding#

For encoding the process is more or less a reversal of the above, but we have to make some decisions on default values.

  1. Infer data_units from the given dates.

  2. Infer units (either cleanup given units or use data_units

  3. Infer the calendar name from the given dates.

  4. If dates are cftime.datetime objects then encode with cftime.date2num

  5. Retrieve time_units and ref_date from units

  6. Check ref_date >= 1582-10-15, otherwise -> cftime

  7. Wrap dates with pd.DatetimeIndex

  8. Subtracting ref_date (pandas.Timestamp) from above pandas.DatetimeIndex will return pandas.TimedeltaIndex

  9. Align resolution of pandas.TimedeltaIndex with resolution of time_units

  10. Retrieve needed units and delta to faithfully encode into int64

  11. Divide time_deltas by delta, use floor division (integer) or normal division (float)

  12. Return result

In [76]: calendar = "proleptic_gregorian"

In [77]: dates = np.array(
   ....:     [
   ....:         "-2000-01-01T00:00:00",
   ....:         "0000-01-01T00:00:00",
   ....:         "0002-01-01T00:00:00",
   ....:         "2000-01-01T00:00:00",
   ....:     ],
   ....:     dtype="datetime64[s]",
   ....: )
   ....: 

In [78]: orig_values = np.array(
   ....:     [-2002 * 365 - 121, -366, 365, 2000 * 365 + 119], dtype="int64"
   ....: )
   ....: 

In [79]: units = "days since 0001-01-01 00:00:00"

In [80]: values, _, _ = xr.coding.times.encode_cf_datetime(
   ....:     dates, units, calendar, dtype=np.dtype("int64")
   ....: )
   ....: 

In [81]: print(values)
[-730851    -366     365  730119]

In [82]: np.testing.assert_array_equal(values, orig_values)

In [83]: dates = np.array(
   ....:     [
   ....:         "-2000-01-01T01:00:00",
   ....:         "0000-01-01T00:00:00",
   ....:         "0002-01-01T00:00:00",
   ....:         "2000-01-01T00:00:00",
   ....:     ],
   ....:     dtype="datetime64[s]",
   ....: )
   ....: 

In [84]: orig_values = np.array(
   ....:     [-2002 * 365 - 121, -366, 365, 2000 * 365 + 119], dtype="int64"
   ....: )
   ....: 

In [85]: units = "days since 0001-01-01 00:00:00"

In [86]: values, units, _ = xr.coding.times.encode_cf_datetime(
   ....:     dates, units, calendar, dtype=np.dtype("int64")
   ....: )
   ....: 

In [87]: print(values, units)
[-17540423     -8784      8760  17522856] hours since 0001-01-01

Default Time Unit#

The current default time unit of xarray is 'ns'. When setting keyword argument time_unit unit to 's' (the lowest resolution pandas allows) datetimes will be converted to at least 's'-resolution, if possible. The same holds true for 'ms' and 'us'.

In [88]: attrs = {"units": "hours since 2000-01-01"}

In [89]: ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)})

In [90]: ds.to_netcdf("test-datetimes1.nc")
In [91]: xr.open_dataset("test-datetimes1.nc")
Out[91]: 
<xarray.Dataset> Size: 32B
Dimensions:  (time: 4)
Coordinates:
  * time     (time) datetime64[ns] 32B 2000-01-01 ... 2000-01-01T03:00:00
Data variables:
    *empty*
In [92]: coder = xr.coders.CFDatetimeCoder(time_unit="s")

In [93]: xr.open_dataset("test-datetimes1.nc", decode_times=coder)
Out[93]: 
<xarray.Dataset> Size: 32B
Dimensions:  (time: 4)
Coordinates:
  * time     (time) datetime64[s] 32B 2000-01-01 ... 2000-01-01T03:00:00
Data variables:
    *empty*

If a coarser unit is requested the datetimes are decoded into their native on-disk resolution, if possible.

In [94]: attrs = {"units": "milliseconds since 2000-01-01"}

In [95]: ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)})

In [96]: ds.to_netcdf("test-datetimes2.nc")
In [97]: xr.open_dataset("test-datetimes2.nc")
Out[97]: 
<xarray.Dataset> Size: 32B
Dimensions:  (time: 4)
Coordinates:
  * time     (time) datetime64[ns] 32B 2000-01-01 ... 2000-01-01T00:00:00.003000
Data variables:
    *empty*
In [98]: coder = xr.coders.CFDatetimeCoder(time_unit="s")

In [99]: xr.open_dataset("test-datetimes2.nc", decode_times=coder)
Out[99]: 
<xarray.Dataset> Size: 32B
Dimensions:  (time: 4)
Coordinates:
  * time     (time) datetime64[ms] 32B 2000-01-01 ... 2000-01-01T00:00:00.003000
Data variables:
    *empty*

Similar logic applies for decoding timedelta values. The default resolution is "ns":

In [100]: attrs = {"units": "hours"}

In [101]: ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)})

In [102]: ds.to_netcdf("test-timedeltas1.nc")
In [103]: xr.open_dataset("test-timedeltas1.nc")
Out[103]: 
<xarray.Dataset> Size: 32B
Dimensions:  (time: 4)
Coordinates:
  * time     (time) timedelta64[ns] 32B 00:00:00 01:00:00 02:00:00 03:00:00
Data variables:
    *empty*

By default, timedeltas will be decoded to the same resolution as datetimes:

In [104]: coder = xr.coders.CFDatetimeCoder(time_unit="s")

In [105]: xr.open_dataset("test-timedeltas1.nc", decode_times=coder)
Out[105]: 
<xarray.Dataset> Size: 32B
Dimensions:  (time: 4)
Coordinates:
  * time     (time) timedelta64[s] 32B 00:00:00 01:00:00 02:00:00 03:00:00
Data variables:
    *empty*

but if one would like to decode timedeltas to a different resolution, one can provide a coder specifically for timedeltas to decode_timedelta:

In [106]: timedelta_coder = xr.coders.CFTimedeltaCoder(time_unit="ms")

In [107]: xr.open_dataset(
   .....:     "test-timedeltas1.nc", decode_times=coder, decode_timedelta=timedelta_coder
   .....: )
   .....: 
Out[107]: 
<xarray.Dataset> Size: 32B
Dimensions:  (time: 4)
Coordinates:
  * time     (time) timedelta64[ms] 32B 00:00:00 01:00:00 02:00:00 03:00:00
Data variables:
    *empty*

As with datetimes, if a coarser unit is requested the timedeltas are decoded into their native on-disk resolution, if possible:

In [108]: attrs = {"units": "milliseconds"}

In [109]: ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)})

In [110]: ds.to_netcdf("test-timedeltas2.nc")
In [111]: xr.open_dataset("test-timedeltas2.nc")
Out[111]: 
<xarray.Dataset> Size: 32B
Dimensions:  (time: 4)
Coordinates:
  * time     (time) timedelta64[ns] 32B 00:00:00 ... 00:00:00.003000
Data variables:
    *empty*
In [112]: coder = xr.coders.CFDatetimeCoder(time_unit="s")

In [113]: xr.open_dataset("test-timedeltas2.nc", decode_times=coder)
Out[113]: 
<xarray.Dataset> Size: 32B
Dimensions:  (time: 4)
Coordinates:
  * time     (time) timedelta64[s] 32B 00:00:00 00:00:00 00:00:00 00:00:00
Data variables:
    *empty*

To opt-out of timedelta decoding (see issue Undesired decoding to timedelta64) pass False to decode_timedelta:

In [114]: xr.open_dataset("test-timedeltas2.nc", decode_timedelta=False)
Out[114]: 
<xarray.Dataset> Size: 32B
Dimensions:  (time: 4)
Coordinates:
  * time     (time) int64 32B 0 1 2 3
Data variables:
    *empty*

Note

Note that in the future the default value of decode_timedelta will be False rather than None.