xarray.Dataset.reindex¶
-
Dataset.
reindex
(indexers=None, method=None, tolerance=None, copy=True, fill_value=<NA>, **indexers_kwargs)¶ Conform this object onto a new set of indexes, filling in missing values with
fill_value
. The default fill value is NaN.- Parameters
indexers (dict. optional) – Dictionary with keys given by dimension names and values given by arrays of coordinates tick labels. Any mis-matched coordinate values will be filled in with NaN, and any mis-matched dimension names will simply be ignored. One of indexers or indexers_kwargs must be provided.
method ({None, 'nearest', 'pad'/'ffill', 'backfill'/'bfill'}, optional) –
Method to use for filling index values in
indexers
not found in this dataset:None (default): don’t fill gaps
pad / ffill: propagate last valid index value forward
backfill / bfill: propagate next valid index value backward
nearest: use nearest valid index value
tolerance (optional) – Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation
abs(index[indexer] - target) <= tolerance
.copy (bool, optional) – If
copy=True
, data in the return value is always copied. Ifcopy=False
and reindexing is unnecessary, or can be performed with only slice operations, then the output may share memory with the input. In either case, a new xarray object is always returned.fill_value (scalar, optional) – Value to use for newly missing values
sparse (use sparse-array. By default, False) –
**indexers_kwargs ({dim: indexer, ..}, optional) – Keyword arguments in the same form as
indexers
. One of indexers or indexers_kwargs must be provided.
- Returns
reindexed – Another dataset, with this dataset’s data but replaced coordinates.
- Return type
Examples
Create a dataset with some fictional data.
>>> import xarray as xr >>> import pandas as pd >>> x = xr.Dataset( ... { ... "temperature": ("station", 20 * np.random.rand(4)), ... "pressure": ("station", 500 * np.random.rand(4)), ... }, ... coords={"station": ["boston", "nyc", "seattle", "denver"]}, ... ) >>> x <xarray.Dataset> Dimensions: (station: 4) Coordinates: * station (station) <U7 'boston' 'nyc' 'seattle' 'denver' Data variables: temperature (station) float64 18.84 14.59 19.22 17.16 pressure (station) float64 324.1 194.3 122.8 244.3 >>> x.indexes station: Index(['boston', 'nyc', 'seattle', 'denver'], dtype='object', name='station')
Create a new index and reindex the dataset. By default values in the new index that do not have corresponding records in the dataset are assigned NaN.
>>> new_index = ["boston", "austin", "seattle", "lincoln"] >>> x.reindex({"station": new_index}) <xarray.Dataset> Dimensions: (station: 4) Coordinates: * station (station) object 'boston' 'austin' 'seattle' 'lincoln' Data variables: temperature (station) float64 18.84 nan 19.22 nan pressure (station) float64 324.1 nan 122.8 nan
We can fill in the missing values by passing a value to the keyword fill_value.
>>> x.reindex({"station": new_index}, fill_value=0) <xarray.Dataset> Dimensions: (station: 4) Coordinates: * station (station) object 'boston' 'austin' 'seattle' 'lincoln' Data variables: temperature (station) float64 18.84 0.0 19.22 0.0 pressure (station) float64 324.1 0.0 122.8 0.0
Because the index is not monotonically increasing or decreasing, we cannot use arguments to the keyword method to fill the NaN values.
>>> x.reindex({"station": new_index}, method="nearest") Traceback (most recent call last): ... raise ValueError('index must be monotonic increasing or decreasing') ValueError: index must be monotonic increasing or decreasing
To further illustrate the filling functionality in reindex, we will create a dataset with a monotonically increasing index (for example, a sequence of dates).
>>> x2 = xr.Dataset( ... { ... "temperature": ( ... "time", ... [15.57, 12.77, np.nan, 0.3081, 16.59, 15.12], ... ), ... "pressure": ("time", 500 * np.random.rand(6)), ... }, ... coords={"time": pd.date_range("01/01/2019", periods=6, freq="D")}, ... ) >>> x2 <xarray.Dataset> Dimensions: (time: 6) Coordinates: * time (time) datetime64[ns] 2019-01-01 2019-01-02 ... 2019-01-06 Data variables: temperature (time) float64 15.57 12.77 nan 0.3081 16.59 15.12 pressure (time) float64 103.4 122.7 452.0 444.0 399.2 486.0
Suppose we decide to expand the dataset to cover a wider date range.
>>> time_index2 = pd.date_range("12/29/2018", periods=10, freq="D") >>> x2.reindex({"time": time_index2}) <xarray.Dataset> Dimensions: (time: 10) Coordinates: * time (time) datetime64[ns] 2018-12-29 2018-12-30 ... 2019-01-07 Data variables: temperature (time) float64 nan nan nan 15.57 ... 0.3081 16.59 15.12 nan pressure (time) float64 nan nan nan 103.4 ... 444.0 399.2 486.0 nan
The index entries that did not have a value in the original data frame (for example, 2018-12-29) are by default filled with NaN. If desired, we can fill in the missing values using one of several options.
For example, to back-propagate the last valid value to fill the NaN values, pass bfill as an argument to the method keyword.
>>> x3 = x2.reindex({"time": time_index2}, method="bfill") >>> x3 <xarray.Dataset> Dimensions: (time: 10) Coordinates: * time (time) datetime64[ns] 2018-12-29 2018-12-30 ... 2019-01-07 Data variables: temperature (time) float64 15.57 15.57 15.57 15.57 ... 16.59 15.12 nan pressure (time) float64 103.4 103.4 103.4 103.4 ... 399.2 486.0 nan
Please note that the NaN value present in the original dataset (at index value 2019-01-03) will not be filled by any of the value propagation schemes.
>>> x2.where(x2.temperature.isnull(), drop=True) <xarray.Dataset> Dimensions: (time: 1) Coordinates: * time (time) datetime64[ns] 2019-01-03 Data variables: temperature (time) float64 nan pressure (time) float64 452.0 >>> x3.where(x3.temperature.isnull(), drop=True) <xarray.Dataset> Dimensions: (time: 2) Coordinates: * time (time) datetime64[ns] 2019-01-03 2019-01-07 Data variables: temperature (time) float64 nan nan pressure (time) float64 452.0 nan
This is because filling while reindexing does not look at dataset values, but only compares the original and desired indexes. If you do want to fill in the NaN values present in the original dataset, use the
fillna()
method.