xarray.Dataset.reindex#
- Dataset.reindex(indexers=None, method=None, tolerance=None, copy=True, fill_value=<NA>, **indexers_kwargs)[source]#
Conform this object onto a new set of indexes, filling in missing values with
fill_value
. The default fill value is NaN.- Parameters
indexers (
dict
, optional) – Dictionary with keys given by dimension names and values given by arrays of coordinates tick labels. Any mis-matched coordinate values will be filled in with NaN, and any mis-matched dimension names will simply be ignored. One of indexers or indexers_kwargs must be provided.method (
{None, "nearest", "pad", "ffill", "backfill", "bfill", None}
, optional) – Method to use for filling index values inindexers
not found in this dataset:None (default): don’t fill gaps
“pad” / “ffill”: propagate last valid index value forward
“backfill” / “bfill”: propagate next valid index value backward
“nearest”: use nearest valid index value
tolerance (
float | Iterable[float] | str | None
, default:None
) – Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equationabs(index[indexer] - target) <= tolerance
. Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like must be the same size as the index and its dtype must exactly match the index’s type.copy (
bool
, default:True
) – Ifcopy=True
, data in the return value is always copied. Ifcopy=False
and reindexing is unnecessary, or can be performed with only slice operations, then the output may share memory with the input. In either case, a new xarray object is always returned.fill_value (scalar or dict-like, optional) – Value to use for newly missing values. If a dict-like, maps variable names (including coordinates) to fill values.
**indexers_kwargs (
{dim: indexer, ...}
, optional) – Keyword arguments in the same form asindexers
. One of indexers or indexers_kwargs must be provided.
- Returns
reindexed (
Dataset
) – Another dataset, with this dataset’s data but replaced coordinates.
See also
Examples
Create a dataset with some fictional data.
>>> x = xr.Dataset( ... { ... "temperature": ("station", 20 * np.random.rand(4)), ... "pressure": ("station", 500 * np.random.rand(4)), ... }, ... coords={"station": ["boston", "nyc", "seattle", "denver"]}, ... ) >>> x <xarray.Dataset> Size: 176B Dimensions: (station: 4) Coordinates: * station (station) <U7 112B 'boston' 'nyc' 'seattle' 'denver' Data variables: temperature (station) float64 32B 10.98 14.3 12.06 10.9 pressure (station) float64 32B 211.8 322.9 218.8 445.9 >>> x.indexes Indexes: station Index(['boston', 'nyc', 'seattle', 'denver'], dtype='object', name='station')
Create a new index and reindex the dataset. By default values in the new index that do not have corresponding records in the dataset are assigned NaN.
>>> new_index = ["boston", "austin", "seattle", "lincoln"] >>> x.reindex({"station": new_index}) <xarray.Dataset> Size: 176B Dimensions: (station: 4) Coordinates: * station (station) <U7 112B 'boston' 'austin' 'seattle' 'lincoln' Data variables: temperature (station) float64 32B 10.98 nan 12.06 nan pressure (station) float64 32B 211.8 nan 218.8 nan
We can fill in the missing values by passing a value to the keyword fill_value.
>>> x.reindex({"station": new_index}, fill_value=0) <xarray.Dataset> Size: 176B Dimensions: (station: 4) Coordinates: * station (station) <U7 112B 'boston' 'austin' 'seattle' 'lincoln' Data variables: temperature (station) float64 32B 10.98 0.0 12.06 0.0 pressure (station) float64 32B 211.8 0.0 218.8 0.0
We can also use different fill values for each variable.
>>> x.reindex( ... {"station": new_index}, fill_value={"temperature": 0, "pressure": 100} ... ) <xarray.Dataset> Size: 176B Dimensions: (station: 4) Coordinates: * station (station) <U7 112B 'boston' 'austin' 'seattle' 'lincoln' Data variables: temperature (station) float64 32B 10.98 0.0 12.06 0.0 pressure (station) float64 32B 211.8 100.0 218.8 100.0
Because the index is not monotonically increasing or decreasing, we cannot use arguments to the keyword method to fill the NaN values.
>>> x.reindex({"station": new_index}, method="nearest") Traceback (most recent call last): ... raise ValueError('index must be monotonic increasing or decreasing') ValueError: index must be monotonic increasing or decreasing
To further illustrate the filling functionality in reindex, we will create a dataset with a monotonically increasing index (for example, a sequence of dates).
>>> x2 = xr.Dataset( ... { ... "temperature": ( ... "time", ... [15.57, 12.77, np.nan, 0.3081, 16.59, 15.12], ... ), ... "pressure": ("time", 500 * np.random.rand(6)), ... }, ... coords={"time": pd.date_range("01/01/2019", periods=6, freq="D")}, ... ) >>> x2 <xarray.Dataset> Size: 144B Dimensions: (time: 6) Coordinates: * time (time) datetime64[ns] 48B 2019-01-01 2019-01-02 ... 2019-01-06 Data variables: temperature (time) float64 48B 15.57 12.77 nan 0.3081 16.59 15.12 pressure (time) float64 48B 481.8 191.7 395.9 264.4 284.0 462.8
Suppose we decide to expand the dataset to cover a wider date range.
>>> time_index2 = pd.date_range("12/29/2018", periods=10, freq="D") >>> x2.reindex({"time": time_index2}) <xarray.Dataset> Size: 240B Dimensions: (time: 10) Coordinates: * time (time) datetime64[ns] 80B 2018-12-29 2018-12-30 ... 2019-01-07 Data variables: temperature (time) float64 80B nan nan nan 15.57 ... 0.3081 16.59 15.12 nan pressure (time) float64 80B nan nan nan 481.8 ... 264.4 284.0 462.8 nan
The index entries that did not have a value in the original data frame (for example, 2018-12-29) are by default filled with NaN. If desired, we can fill in the missing values using one of several options.
For example, to back-propagate the last valid value to fill the NaN values, pass bfill as an argument to the method keyword.
>>> x3 = x2.reindex({"time": time_index2}, method="bfill") >>> x3 <xarray.Dataset> Size: 240B Dimensions: (time: 10) Coordinates: * time (time) datetime64[ns] 80B 2018-12-29 2018-12-30 ... 2019-01-07 Data variables: temperature (time) float64 80B 15.57 15.57 15.57 15.57 ... 16.59 15.12 nan pressure (time) float64 80B 481.8 481.8 481.8 481.8 ... 284.0 462.8 nan
Please note that the NaN value present in the original dataset (at index value 2019-01-03) will not be filled by any of the value propagation schemes.
>>> x2.where(x2.temperature.isnull(), drop=True) <xarray.Dataset> Size: 24B Dimensions: (time: 1) Coordinates: * time (time) datetime64[ns] 8B 2019-01-03 Data variables: temperature (time) float64 8B nan pressure (time) float64 8B 395.9 >>> x3.where(x3.temperature.isnull(), drop=True) <xarray.Dataset> Size: 48B Dimensions: (time: 2) Coordinates: * time (time) datetime64[ns] 16B 2019-01-03 2019-01-07 Data variables: temperature (time) float64 16B nan nan pressure (time) float64 16B 395.9 nan
This is because filling while reindexing does not look at dataset values, but only compares the original and desired indexes. If you do want to fill in the NaN values present in the original dataset, use the
fillna()
method.