xarray.Dataset.reindex

xarray.Dataset.reindex#

Dataset.reindex(indexers=None, method=None, tolerance=None, copy=True, fill_value=<NA>, **indexers_kwargs)[source]#

Conform this object onto a new set of indexes, filling in missing values with fill_value. The default fill value is NaN.

Parameters
  • indexers (dict, optional) – Dictionary with keys given by dimension names and values given by arrays of coordinates tick labels. Any mis-matched coordinate values will be filled in with NaN, and any mis-matched dimension names will simply be ignored. One of indexers or indexers_kwargs must be provided.

  • method ({None, "nearest", "pad", "ffill", "backfill", "bfill", None}, optional) – Method to use for filling index values in indexers not found in this dataset:

    • None (default): don’t fill gaps

    • “pad” / “ffill”: propagate last valid index value forward

    • “backfill” / “bfill”: propagate next valid index value backward

    • “nearest”: use nearest valid index value

  • tolerance (float | Iterable[float] | str | None, default: None) – Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation abs(index[indexer] - target) <= tolerance. Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like must be the same size as the index and its dtype must exactly match the index’s type.

  • copy (bool, default: True) – If copy=True, data in the return value is always copied. If copy=False and reindexing is unnecessary, or can be performed with only slice operations, then the output may share memory with the input. In either case, a new xarray object is always returned.

  • fill_value (scalar or dict-like, optional) – Value to use for newly missing values. If a dict-like, maps variable names (including coordinates) to fill values.

  • sparse (bool, default: False) – use sparse-array.

  • **indexers_kwargs ({dim: indexer, ...}, optional) – Keyword arguments in the same form as indexers. One of indexers or indexers_kwargs must be provided.

Returns

reindexed (Dataset) – Another dataset, with this dataset’s data but replaced coordinates.

Examples

Create a dataset with some fictional data.

>>> x = xr.Dataset(
...     {
...         "temperature": ("station", 20 * np.random.rand(4)),
...         "pressure": ("station", 500 * np.random.rand(4)),
...     },
...     coords={"station": ["boston", "nyc", "seattle", "denver"]},
... )
>>> x
<xarray.Dataset> Size: 176B
Dimensions:      (station: 4)
Coordinates:
  * station      (station) <U7 112B 'boston' 'nyc' 'seattle' 'denver'
Data variables:
    temperature  (station) float64 32B 10.98 14.3 12.06 10.9
    pressure     (station) float64 32B 211.8 322.9 218.8 445.9
>>> x.indexes
Indexes:
    station  Index(['boston', 'nyc', 'seattle', 'denver'], dtype='object', name='station')

Create a new index and reindex the dataset. By default values in the new index that do not have corresponding records in the dataset are assigned NaN.

>>> new_index = ["boston", "austin", "seattle", "lincoln"]
>>> x.reindex({"station": new_index})
<xarray.Dataset> Size: 176B
Dimensions:      (station: 4)
Coordinates:
  * station      (station) <U7 112B 'boston' 'austin' 'seattle' 'lincoln'
Data variables:
    temperature  (station) float64 32B 10.98 nan 12.06 nan
    pressure     (station) float64 32B 211.8 nan 218.8 nan

We can fill in the missing values by passing a value to the keyword fill_value.

>>> x.reindex({"station": new_index}, fill_value=0)
<xarray.Dataset> Size: 176B
Dimensions:      (station: 4)
Coordinates:
  * station      (station) <U7 112B 'boston' 'austin' 'seattle' 'lincoln'
Data variables:
    temperature  (station) float64 32B 10.98 0.0 12.06 0.0
    pressure     (station) float64 32B 211.8 0.0 218.8 0.0

We can also use different fill values for each variable.

>>> x.reindex(
...     {"station": new_index}, fill_value={"temperature": 0, "pressure": 100}
... )
<xarray.Dataset> Size: 176B
Dimensions:      (station: 4)
Coordinates:
  * station      (station) <U7 112B 'boston' 'austin' 'seattle' 'lincoln'
Data variables:
    temperature  (station) float64 32B 10.98 0.0 12.06 0.0
    pressure     (station) float64 32B 211.8 100.0 218.8 100.0

Because the index is not monotonically increasing or decreasing, we cannot use arguments to the keyword method to fill the NaN values.

>>> x.reindex({"station": new_index}, method="nearest")
Traceback (most recent call last):
...
    raise ValueError('index must be monotonic increasing or decreasing')
ValueError: index must be monotonic increasing or decreasing

To further illustrate the filling functionality in reindex, we will create a dataset with a monotonically increasing index (for example, a sequence of dates).

>>> x2 = xr.Dataset(
...     {
...         "temperature": (
...             "time",
...             [15.57, 12.77, np.nan, 0.3081, 16.59, 15.12],
...         ),
...         "pressure": ("time", 500 * np.random.rand(6)),
...     },
...     coords={"time": pd.date_range("01/01/2019", periods=6, freq="D")},
... )
>>> x2
<xarray.Dataset> Size: 144B
Dimensions:      (time: 6)
Coordinates:
  * time         (time) datetime64[ns] 48B 2019-01-01 2019-01-02 ... 2019-01-06
Data variables:
    temperature  (time) float64 48B 15.57 12.77 nan 0.3081 16.59 15.12
    pressure     (time) float64 48B 481.8 191.7 395.9 264.4 284.0 462.8

Suppose we decide to expand the dataset to cover a wider date range.

>>> time_index2 = pd.date_range("12/29/2018", periods=10, freq="D")
>>> x2.reindex({"time": time_index2})
<xarray.Dataset> Size: 240B
Dimensions:      (time: 10)
Coordinates:
  * time         (time) datetime64[ns] 80B 2018-12-29 2018-12-30 ... 2019-01-07
Data variables:
    temperature  (time) float64 80B nan nan nan 15.57 ... 0.3081 16.59 15.12 nan
    pressure     (time) float64 80B nan nan nan 481.8 ... 264.4 284.0 462.8 nan

The index entries that did not have a value in the original data frame (for example, 2018-12-29) are by default filled with NaN. If desired, we can fill in the missing values using one of several options.

For example, to back-propagate the last valid value to fill the NaN values, pass bfill as an argument to the method keyword.

>>> x3 = x2.reindex({"time": time_index2}, method="bfill")
>>> x3
<xarray.Dataset> Size: 240B
Dimensions:      (time: 10)
Coordinates:
  * time         (time) datetime64[ns] 80B 2018-12-29 2018-12-30 ... 2019-01-07
Data variables:
    temperature  (time) float64 80B 15.57 15.57 15.57 15.57 ... 16.59 15.12 nan
    pressure     (time) float64 80B 481.8 481.8 481.8 481.8 ... 284.0 462.8 nan

Please note that the NaN value present in the original dataset (at index value 2019-01-03) will not be filled by any of the value propagation schemes.

>>> x2.where(x2.temperature.isnull(), drop=True)
<xarray.Dataset> Size: 24B
Dimensions:      (time: 1)
Coordinates:
  * time         (time) datetime64[ns] 8B 2019-01-03
Data variables:
    temperature  (time) float64 8B nan
    pressure     (time) float64 8B 395.9
>>> x3.where(x3.temperature.isnull(), drop=True)
<xarray.Dataset> Size: 48B
Dimensions:      (time: 2)
Coordinates:
  * time         (time) datetime64[ns] 16B 2019-01-03 2019-01-07
Data variables:
    temperature  (time) float64 16B nan nan
    pressure     (time) float64 16B 395.9 nan

This is because filling while reindexing does not look at dataset values, but only compares the original and desired indexes. If you do want to fill in the NaN values present in the original dataset, use the fillna() method.