# Interpolating data#

Xarray offers flexible interpolation routines, which have a similar interface to our indexing.

Note

`interp` requires scipy installed.

## Scalar and 1-dimensional interpolation#

Interpolating a `DataArray` works mostly like labeled indexing of a `DataArray`,

```In [1]: da = xr.DataArray(
...:     np.sin(0.3 * np.arange(12).reshape(4, 3)),
...:     [("time", np.arange(4)), ("space", [0.1, 0.2, 0.3])],
...: )
...:

# label lookup
In [2]: da.sel(time=3)
Out[2]:
<xarray.DataArray (space: 3)>
array([ 0.427,  0.141, -0.158])
Coordinates:
time     int64 3
* space    (space) float64 0.1 0.2 0.3

# interpolation
In [3]: da.interp(time=2.5)
Out[3]:
<xarray.DataArray (space: 3)>
array([0.701, 0.502, 0.259])
Coordinates:
* space    (space) float64 0.1 0.2 0.3
time     float64 2.5
```

Similar to the indexing, `interp()` also accepts an array-like, which gives the interpolated result as an array.

```# label lookup
In [4]: da.sel(time=[2, 3])
Out[4]:
<xarray.DataArray (time: 2, space: 3)>
array([[ 0.974,  0.863,  0.675],
[ 0.427,  0.141, -0.158]])
Coordinates:
* time     (time) int64 2 3
* space    (space) float64 0.1 0.2 0.3

# interpolation
In [5]: da.interp(time=[2.5, 3.5])
Out[5]:
<xarray.DataArray (time: 2, space: 3)>
array([[0.701, 0.502, 0.259],
[  nan,   nan,   nan]])
Coordinates:
* space    (space) float64 0.1 0.2 0.3
* time     (time) float64 2.5 3.5
```

To interpolate data with a numpy.datetime64 coordinate you can pass a string.

```In [6]: da_dt64 = xr.DataArray(
...:     [1, 3], [("time", pd.date_range("1/1/2000", "1/3/2000", periods=2))]
...: )
...:

In [7]: da_dt64.interp(time="2000-01-02")
Out[7]:
<xarray.DataArray ()>
array(2.)
Coordinates:
time     datetime64[ns] 2000-01-02
```

The interpolated data can be merged into the original `DataArray` by specifying the time periods required.

```In [8]: da_dt64.interp(time=pd.date_range("1/1/2000", "1/3/2000", periods=3))
Out[8]:
<xarray.DataArray (time: 3)>
array([1., 2., 3.])
Coordinates:
* time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03
```

Interpolation of data indexed by a `CFTimeIndex` is also allowed. See Non-standard calendars and dates outside the nanosecond-precision range for examples.

Note

Currently, our interpolation only works for regular grids. Therefore, similarly to `sel()`, only 1D coordinates along a dimension can be used as the original coordinate to be interpolated.

## Multi-dimensional Interpolation#

Like `sel()`, `interp()` accepts multiple coordinates. In this case, multidimensional interpolation is carried out.

```# label lookup
In [9]: da.sel(time=2, space=0.1)
Out[9]:
<xarray.DataArray ()>
array(0.974)
Coordinates:
time     int64 2
space    float64 0.1

# interpolation
In [10]: da.interp(time=2.5, space=0.15)
Out[10]:
<xarray.DataArray ()>
array(0.601)
Coordinates:
time     float64 2.5
space    float64 0.15
```

Array-like coordinates are also accepted:

```# label lookup
In [11]: da.sel(time=[2, 3], space=[0.1, 0.2])
Out[11]:
<xarray.DataArray (time: 2, space: 2)>
array([[0.974, 0.863],
[0.427, 0.141]])
Coordinates:
* time     (time) int64 2 3
* space    (space) float64 0.1 0.2

# interpolation
In [12]: da.interp(time=[1.5, 2.5], space=[0.15, 0.25])
Out[12]:
<xarray.DataArray (time: 2, space: 2)>
array([[0.888, 0.867],
[0.601, 0.381]])
Coordinates:
* time     (time) float64 1.5 2.5
* space    (space) float64 0.15 0.25
```

`interp_like()` method is a useful shortcut. This method interpolates an xarray object onto the coordinates of another xarray object. For example, if we want to compute the difference between two `DataArray` s (`da` and `other`) staying on slightly different coordinates,

```In [13]: other = xr.DataArray(
....:     np.sin(0.4 * np.arange(9).reshape(3, 3)),
....:     [("time", [0.9, 1.9, 2.9]), ("space", [0.15, 0.25, 0.35])],
....: )
....:
```

it might be a good idea to first interpolate `da` so that it will stay on the same coordinates of `other`, and then subtract it. `interp_like()` can be used for such a case,

```# interpolate da along other's coordinates
In [14]: interpolated = da.interp_like(other)

In [15]: interpolated
Out[15]:
<xarray.DataArray (time: 3, space: 3)>
array([[0.787, 0.911,   nan],
[0.912, 0.789,   nan],
[0.348, 0.069,   nan]])
Coordinates:
* time     (time) float64 0.9 1.9 2.9
* space    (space) float64 0.15 0.25 0.35
```

It is now possible to safely compute the difference `other - interpolated`.

## Interpolation methods#

We use `scipy.interpolate.interp1d` for 1-dimensional interpolation. For multi-dimensional interpolation, an attempt is first made to decompose the interpolation in a series of 1-dimensional interpolations, in which case `scipy.interpolate.interp1d` is used. If a decomposition cannot be made (e.g. with advanced interpolation), `scipy.interpolate.interpn()` is used.

The interpolation method can be specified by the optional `method` argument.

```In [16]: da = xr.DataArray(
....:     np.sin(np.linspace(0, 2 * np.pi, 10)),
....:     dims="x",
....:     coords={"x": np.linspace(0, 1, 10)},
....: )
....:

In [17]: da.plot.line("o", label="original")
Out[17]: [<matplotlib.lines.Line2D at 0x7f2f14b07460>]

In [18]: da.interp(x=np.linspace(0, 1, 100)).plot.line(label="linear (default)")
Out[18]: [<matplotlib.lines.Line2D at 0x7f2f14b06e30>]

In [19]: da.interp(x=np.linspace(0, 1, 100), method="cubic").plot.line(label="cubic")
Out[19]: [<matplotlib.lines.Line2D at 0x7f2f14b06da0>]

In [20]: plt.legend()
Out[20]: <matplotlib.legend.Legend at 0x7f2f149b9b70>
```

Additional keyword arguments can be passed to scipy’s functions.

```# fill 0 for the outside of the original coordinates.
In [21]: da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={"fill_value": 0.0})
Out[21]:
<xarray.DataArray (x: 10)>
array([ 0.   ,  0.   ,  0.   ,  0.814,  0.604, -0.604, -0.814,  0.   ,  0.   ,  0.   ])
Coordinates:
* x        (x) float64 -0.5 -0.2778 -0.05556 0.1667 ... 0.8333 1.056 1.278 1.5

# 1-dimensional extrapolation
In [22]: da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={"fill_value": "extrapolate"})
Out[22]:
<xarray.DataArray (x: 10)>
array([-2.893, -1.607, -0.321,  0.814,  0.604, -0.604, -0.814,  0.321,  1.607,  2.893])
Coordinates:
* x        (x) float64 -0.5 -0.2778 -0.05556 0.1667 ... 0.8333 1.056 1.278 1.5

# multi-dimensional extrapolation
In [23]: da = xr.DataArray(
....:     np.sin(0.3 * np.arange(12).reshape(4, 3)),
....:     [("time", np.arange(4)), ("space", [0.1, 0.2, 0.3])],
....: )
....:

In [24]: da.interp(
....:     time=4, space=np.linspace(-0.1, 0.5, 10), kwargs={"fill_value": "extrapolate"}
....: )
....:
Out[24]:
<xarray.DataArray (space: 10)>
array([ 0.805,  0.497,  0.189, -0.119, -0.427, -0.718, -0.991, -1.264, -1.538, -1.811])
Coordinates:
time     int64 4
* space    (space) float64 -0.1 -0.03333 0.03333 0.1 ... 0.3 0.3667 0.4333 0.5
```

`interp()` accepts `DataArray` as similar to `sel()`, which enables us more advanced interpolation. Based on the dimension of the new coordinate passed to `interp()`, the dimension of the result are determined.

For example, if you want to interpolate a two dimensional array along a particular dimension, as illustrated below, you can pass two 1-dimensional `DataArray` s with a common dimension as new coordinate.

For example:

```In [25]: da = xr.DataArray(
....:     np.sin(0.3 * np.arange(20).reshape(5, 4)),
....:     [("x", np.arange(5)), ("y", [0.1, 0.2, 0.3, 0.4])],
....: )
....:

In [26]: x = xr.DataArray([0, 2, 4], dims="z")

In [27]: y = xr.DataArray([0.1, 0.2, 0.3], dims="z")

In [28]: da.sel(x=x, y=y)
Out[28]:
<xarray.DataArray (z: 3)>
array([ 0.   ,  0.427, -0.773])
Coordinates:
x        (z) int64 0 2 4
y        (z) float64 0.1 0.2 0.3
Dimensions without coordinates: z

# advanced interpolation, without extrapolation
In [29]: x = xr.DataArray([0.5, 1.5, 2.5, 3.5], dims="z")

In [30]: y = xr.DataArray([0.15, 0.25, 0.35, 0.45], dims="z")

In [31]: da.interp(x=x, y=y)
Out[31]:
<xarray.DataArray (z: 4)>
array([ 0.556,  0.635, -0.466,    nan])
Coordinates:
x        (z) float64 0.5 1.5 2.5 3.5
y        (z) float64 0.15 0.25 0.35 0.45
Dimensions without coordinates: z
```

where values on the original coordinates `(x, y) = ((0.5, 0.15), (1.5, 0.25), (2.5, 0.35), (3.5, 0.45))` are obtained by the 2-dimensional interpolation and mapped along a new dimension `z`. Since no keyword arguments are passed to the interpolation routine, no extrapolation is performed resulting in a `nan` value.

If you want to add a coordinate to the new dimension `z`, you can supply `DataArray` s with a coordinate. Extrapolation can be achieved by passing additional arguments to SciPy’s `interpnd` function,

```In [32]: x = xr.DataArray([0.5, 1.5, 2.5, 3.5], dims="z", coords={"z": ["a", "b", "c", "d"]})

In [33]: y = xr.DataArray(
....:     [0.15, 0.25, 0.35, 0.45], dims="z", coords={"z": ["a", "b", "c", "d"]}
....: )
....:

In [34]: da.interp(x=x, y=y, kwargs={"fill_value": None})
Out[34]:
<xarray.DataArray (z: 4)>
array([ 0.556,  0.635, -0.466, -0.735])
Coordinates:
x        (z) float64 0.5 1.5 2.5 3.5
y        (z) float64 0.15 0.25 0.35 0.45
* z        (z) <U1 'a' 'b' 'c' 'd'
```

For the details of the advanced indexing, see more advanced indexing.

## Interpolating arrays with NaN#

Our `interp()` works with arrays with NaN the same way that scipy.interpolate.interp1d and scipy.interpolate.interpn do. `linear` and `nearest` methods return arrays including NaN, while other methods such as `cubic` or `quadratic` return all NaN arrays.

```In [35]: da = xr.DataArray([0, 2, np.nan, 3, 3.25], dims="x", coords={"x": range(5)})

In [36]: da.interp(x=[0.5, 1.5, 2.5])
Out[36]:
<xarray.DataArray (x: 3)>
array([ 1., nan, nan])
Coordinates:
* x        (x) float64 0.5 1.5 2.5

In [37]: da.interp(x=[0.5, 1.5, 2.5], method="cubic")
Out[37]:
<xarray.DataArray (x: 3)>
array([nan, nan, nan])
Coordinates:
* x        (x) float64 0.5 1.5 2.5
```

To avoid this, you can drop NaN by `dropna()`, and then make the interpolation

```In [38]: dropped = da.dropna("x")

In [39]: dropped
Out[39]:
<xarray.DataArray (x: 4)>
array([0.  , 2.  , 3.  , 3.25])
Coordinates:
* x        (x) int64 0 1 3 4

In [40]: dropped.interp(x=[0.5, 1.5, 2.5], method="cubic")
Out[40]:
<xarray.DataArray (x: 3)>
array([1.19 , 2.508, 2.93 ])
Coordinates:
* x        (x) float64 0.5 1.5 2.5
```

If NaNs are distributed randomly in your multidimensional array, dropping all the columns containing more than one NaNs by `dropna()` may lose a significant amount of information. In such a case, you can fill NaN by `interpolate_na()`, which is similar to `pandas.Series.interpolate()`.

```In [41]: filled = da.interpolate_na(dim="x")

In [42]: filled
Out[42]:
<xarray.DataArray (x: 5)>
array([0.  , 2.  , 2.5 , 3.  , 3.25])
Coordinates:
* x        (x) int64 0 1 2 3 4
```

This fills NaN by interpolating along the specified dimension. After filling NaNs, you can interpolate:

```In [43]: filled.interp(x=[0.5, 1.5, 2.5], method="cubic")
Out[43]:
<xarray.DataArray (x: 3)>
array([1.309, 2.316, 2.738])
Coordinates:
* x        (x) float64 0.5 1.5 2.5
```

For the details of `interpolate_na()`, see Missing values.

## Example#

Let’s see how `interp()` works on real data.

```# Raw data
In [44]: ds = xr.tutorial.open_dataset("air_temperature").isel(time=0)

In [45]: fig, axes = plt.subplots(ncols=2, figsize=(10, 4))

In [46]: ds.air.plot(ax=axes[0])
Out[46]: <matplotlib.collections.QuadMesh at 0x7f2f05fc6200>

In [47]: axes[0].set_title("Raw data")
Out[47]: Text(0.5, 1.0, 'Raw data')

# Interpolated data
In [48]: new_lon = np.linspace(ds.lon[0], ds.lon[-1], ds.dims["lon"] * 4)

In [49]: new_lat = np.linspace(ds.lat[0], ds.lat[-1], ds.dims["lat"] * 4)

In [50]: dsi = ds.interp(lat=new_lat, lon=new_lon)

In [51]: dsi.air.plot(ax=axes[1])
Out[51]: <matplotlib.collections.QuadMesh at 0x7f2f06113940>

In [52]: axes[1].set_title("Interpolated data")
Out[52]: Text(0.5, 1.0, 'Interpolated data')
```

Our advanced interpolation can be used to remap the data to the new coordinate. Consider the new coordinates x and z on the two dimensional plane. The remapping can be done as follows

```# new coordinate
In [53]: x = np.linspace(240, 300, 100)

In [54]: z = np.linspace(20, 70, 100)

# relation between new and original coordinates
In [55]: lat = xr.DataArray(z, dims=["z"], coords={"z": z})

In [56]: lon = xr.DataArray(
....:     (x[:, np.newaxis] - 270) / np.cos(z * np.pi / 180) + 270,
....:     dims=["x", "z"],
....:     coords={"x": x, "z": z},
....: )
....:

In [57]: fig, axes = plt.subplots(ncols=2, figsize=(10, 4))

In [58]: ds.air.plot(ax=axes[0])
Out[58]: <matplotlib.collections.QuadMesh at 0x7f2f05ebc640>

# draw the new coordinate on the original coordinates.
In [59]: for idx in [0, 33, 66, 99]:
....:     axes[0].plot(lon.isel(x=idx), lat, "--k")
....:

In [60]: for idx in [0, 33, 66, 99]:
....:     axes[0].plot(*xr.broadcast(lon.isel(z=idx), lat.isel(z=idx)), "--k")
....:

In [61]: axes[0].set_title("Raw data")
Out[61]: Text(0.5, 1.0, 'Raw data')

In [62]: dsi = ds.interp(lon=lon, lat=lat)

In [63]: dsi.air.plot(ax=axes[1])
Out[63]: <matplotlib.collections.QuadMesh at 0x7f2f05ebc910>

In [64]: axes[1].set_title("Remapped data")
Out[64]: Text(0.5, 1.0, 'Remapped data')
```