Interpolating data

Interpolating data#

Xarray offers flexible interpolation routines, which have a similar interface to our indexing.

Note

interp requires scipy installed.

Scalar and 1-dimensional interpolation#

Interpolating a DataArray works mostly like labeled indexing of a DataArray,

In [1]: da = xr.DataArray(
   ...:     np.sin(0.3 * np.arange(12).reshape(4, 3)),
   ...:     [("time", np.arange(4)), ("space", [0.1, 0.2, 0.3])],
   ...: )
   ...: 

# label lookup
In [2]: da.sel(time=3)
Out[2]: 
<xarray.DataArray (space: 3)> Size: 24B
array([ 0.427,  0.141, -0.158])
Coordinates:
    time     int64 8B 3
  * space    (space) float64 24B 0.1 0.2 0.3

# interpolation
In [3]: da.interp(time=2.5)
Out[3]: 
<xarray.DataArray (space: 3)> Size: 24B
array([0.701, 0.502, 0.259])
Coordinates:
  * space    (space) float64 24B 0.1 0.2 0.3
    time     float64 8B 2.5

Similar to the indexing, interp() also accepts an array-like, which gives the interpolated result as an array.

# label lookup
In [4]: da.sel(time=[2, 3])
Out[4]: 
<xarray.DataArray (time: 2, space: 3)> Size: 48B
array([[ 0.974,  0.863,  0.675],
       [ 0.427,  0.141, -0.158]])
Coordinates:
  * time     (time) int64 16B 2 3
  * space    (space) float64 24B 0.1 0.2 0.3

# interpolation
In [5]: da.interp(time=[2.5, 3.5])
Out[5]: 
<xarray.DataArray (time: 2, space: 3)> Size: 48B
array([[0.701, 0.502, 0.259],
       [  nan,   nan,   nan]])
Coordinates:
  * space    (space) float64 24B 0.1 0.2 0.3
  * time     (time) float64 16B 2.5 3.5

To interpolate data with a numpy.datetime64 coordinate you can pass a string.

In [6]: da_dt64 = xr.DataArray(
   ...:     [1, 3], [("time", pd.date_range("1/1/2000", "1/3/2000", periods=2))]
   ...: )
   ...: 

In [7]: da_dt64.interp(time="2000-01-02")
Out[7]: 
<xarray.DataArray ()> Size: 8B
array(2.)
Coordinates:
    time     datetime64[ns] 8B 2000-01-02

The interpolated data can be merged into the original DataArray by specifying the time periods required.

In [8]: da_dt64.interp(time=pd.date_range("1/1/2000", "1/3/2000", periods=3))
Out[8]: 
<xarray.DataArray (time: 3)> Size: 24B
array([1., 2., 3.])
Coordinates:
  * time     (time) datetime64[ns] 24B 2000-01-01 2000-01-02 2000-01-03

Interpolation of data indexed by a CFTimeIndex is also allowed. See Non-standard calendars and dates outside the nanosecond-precision range for examples.

Note

Currently, our interpolation only works for regular grids. Therefore, similarly to sel(), only 1D coordinates along a dimension can be used as the original coordinate to be interpolated.

Multi-dimensional Interpolation#

Like sel(), interp() accepts multiple coordinates. In this case, multidimensional interpolation is carried out.

# label lookup
In [9]: da.sel(time=2, space=0.1)
Out[9]: 
<xarray.DataArray ()> Size: 8B
array(0.974)
Coordinates:
    time     int64 8B 2
    space    float64 8B 0.1

# interpolation
In [10]: da.interp(time=2.5, space=0.15)
Out[10]: 
<xarray.DataArray ()> Size: 8B
array(0.601)
Coordinates:
    time     float64 8B 2.5
    space    float64 8B 0.15

Array-like coordinates are also accepted:

# label lookup
In [11]: da.sel(time=[2, 3], space=[0.1, 0.2])
Out[11]: 
<xarray.DataArray (time: 2, space: 2)> Size: 32B
array([[0.974, 0.863],
       [0.427, 0.141]])
Coordinates:
  * time     (time) int64 16B 2 3
  * space    (space) float64 16B 0.1 0.2

# interpolation
In [12]: da.interp(time=[1.5, 2.5], space=[0.15, 0.25])
Out[12]: 
<xarray.DataArray (time: 2, space: 2)> Size: 32B
array([[0.888, 0.867],
       [0.601, 0.381]])
Coordinates:
  * time     (time) float64 16B 1.5 2.5
  * space    (space) float64 16B 0.15 0.25

interp_like() method is a useful shortcut. This method interpolates an xarray object onto the coordinates of another xarray object. For example, if we want to compute the difference between two DataArray s (da and other) staying on slightly different coordinates,

In [13]: other = xr.DataArray(
   ....:     np.sin(0.4 * np.arange(9).reshape(3, 3)),
   ....:     [("time", [0.9, 1.9, 2.9]), ("space", [0.15, 0.25, 0.35])],
   ....: )
   ....: 

it might be a good idea to first interpolate da so that it will stay on the same coordinates of other, and then subtract it. interp_like() can be used for such a case,

# interpolate da along other's coordinates
In [14]: interpolated = da.interp_like(other)

In [15]: interpolated
Out[15]: 
<xarray.DataArray (time: 3, space: 3)> Size: 72B
array([[0.787, 0.911,   nan],
       [0.912, 0.789,   nan],
       [0.348, 0.069,   nan]])
Coordinates:
  * time     (time) float64 24B 0.9 1.9 2.9
  * space    (space) float64 24B 0.15 0.25 0.35

It is now possible to safely compute the difference other - interpolated.

Interpolation methods#

We use either scipy.interpolate.interp1d or special interpolants from scipy.interpolate for 1-dimensional interpolation (see interp()). For multi-dimensional interpolation, an attempt is first made to decompose the interpolation in a series of 1-dimensional interpolations, in which case the relevant 1-dimensional interpolator is used. If a decomposition cannot be made (e.g. with advanced interpolation), scipy.interpolate.interpn() is used.

The interpolation method can be specified by the optional method argument.

In [16]: da = xr.DataArray(
   ....:     np.sin(np.linspace(0, 2 * np.pi, 10)),
   ....:     dims="x",
   ....:     coords={"x": np.linspace(0, 1, 10)},
   ....: )
   ....: 

In [17]: da.plot.line("o", label="original")
Out[17]: [<matplotlib.lines.Line2D at 0x7f0ae2a222d0>]

In [18]: da.interp(x=np.linspace(0, 1, 100)).plot.line(label="linear (default)")
Out[18]: [<matplotlib.lines.Line2D at 0x7f0ad6ce3050>]

In [19]: da.interp(x=np.linspace(0, 1, 100), method="cubic").plot.line(label="cubic")
Out[19]: [<matplotlib.lines.Line2D at 0x7f0ae2e19520>]

In [20]: plt.legend()
Out[20]: <matplotlib.legend.Legend at 0x7f0ad6f5e180>

Additional keyword arguments can be passed to scipy’s functions.

# fill 0 for the outside of the original coordinates.
In [21]: da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={"fill_value": 0.0})
Out[21]: 
<xarray.DataArray (x: 10)> Size: 80B
array([ 0.   ,  0.   ,  0.   ,  0.814,  0.604, -0.604, -0.814,  0.   ,  0.   ,  0.   ])
Coordinates:
  * x        (x) float64 80B -0.5 -0.2778 -0.05556 0.1667 ... 1.056 1.278 1.5

# 1-dimensional extrapolation
In [22]: da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={"fill_value": "extrapolate"})
Out[22]: 
<xarray.DataArray (x: 10)> Size: 80B
array([-2.893, -1.607, -0.321,  0.814,  0.604, -0.604, -0.814,  0.321,  1.607,  2.893])
Coordinates:
  * x        (x) float64 80B -0.5 -0.2778 -0.05556 0.1667 ... 1.056 1.278 1.5

# multi-dimensional extrapolation
In [23]: da = xr.DataArray(
   ....:     np.sin(0.3 * np.arange(12).reshape(4, 3)),
   ....:     [("time", np.arange(4)), ("space", [0.1, 0.2, 0.3])],
   ....: )
   ....: 

In [24]: da.interp(
   ....:     time=4, space=np.linspace(-0.1, 0.5, 10), kwargs={"fill_value": "extrapolate"}
   ....: )
   ....: 
Out[24]: 
<xarray.DataArray (space: 10)> Size: 80B
array([ 0.805,  0.497,  0.189, -0.119, -0.427, -0.718, -0.991, -1.264, -1.538, -1.811])
Coordinates:
    time     int64 8B 4
  * space    (space) float64 80B -0.1 -0.03333 0.03333 0.1 ... 0.3667 0.4333 0.5

Advanced Interpolation#

interp() accepts DataArray as similar to sel(), which enables us more advanced interpolation. Based on the dimension of the new coordinate passed to interp(), the dimension of the result are determined.

For example, if you want to interpolate a two dimensional array along a particular dimension, as illustrated below, you can pass two 1-dimensional DataArray s with a common dimension as new coordinate.

For example:

In [25]: da = xr.DataArray(
   ....:     np.sin(0.3 * np.arange(20).reshape(5, 4)),
   ....:     [("x", np.arange(5)), ("y", [0.1, 0.2, 0.3, 0.4])],
   ....: )
   ....: 

# advanced indexing
In [26]: x = xr.DataArray([0, 2, 4], dims="z")

In [27]: y = xr.DataArray([0.1, 0.2, 0.3], dims="z")

In [28]: da.sel(x=x, y=y)
Out[28]: 
<xarray.DataArray (z: 3)> Size: 24B
array([ 0.   ,  0.427, -0.773])
Coordinates:
    x        (z) int64 24B 0 2 4
    y        (z) float64 24B 0.1 0.2 0.3
Dimensions without coordinates: z

# advanced interpolation, without extrapolation
In [29]: x = xr.DataArray([0.5, 1.5, 2.5, 3.5], dims="z")

In [30]: y = xr.DataArray([0.15, 0.25, 0.35, 0.45], dims="z")

In [31]: da.interp(x=x, y=y)
Out[31]: 
<xarray.DataArray (z: 4)> Size: 32B
array([ 0.556,  0.635, -0.466,    nan])
Coordinates:
    x        (z) float64 32B 0.5 1.5 2.5 3.5
    y        (z) float64 32B 0.15 0.25 0.35 0.45
Dimensions without coordinates: z

where values on the original coordinates (x, y) = ((0.5, 0.15), (1.5, 0.25), (2.5, 0.35), (3.5, 0.45)) are obtained by the 2-dimensional interpolation and mapped along a new dimension z. Since no keyword arguments are passed to the interpolation routine, no extrapolation is performed resulting in a nan value.

If you want to add a coordinate to the new dimension z, you can supply DataArray s with a coordinate. Extrapolation can be achieved by passing additional arguments to SciPy’s interpnd function,

In [32]: x = xr.DataArray([0.5, 1.5, 2.5, 3.5], dims="z", coords={"z": ["a", "b", "c", "d"]})

In [33]: y = xr.DataArray(
   ....:     [0.15, 0.25, 0.35, 0.45], dims="z", coords={"z": ["a", "b", "c", "d"]}
   ....: )
   ....: 

In [34]: da.interp(x=x, y=y, kwargs={"fill_value": None})
Out[34]: 
<xarray.DataArray (z: 4)> Size: 32B
array([ 0.556,  0.635, -0.466, -0.735])
Coordinates:
    x        (z) float64 32B 0.5 1.5 2.5 3.5
    y        (z) float64 32B 0.15 0.25 0.35 0.45
  * z        (z) <U1 16B 'a' 'b' 'c' 'd'

For the details of the advanced indexing, see more advanced indexing.

Interpolating arrays with NaN#

Our interp() works with arrays with NaN the same way that scipy.interpolate.interp1d and scipy.interpolate.interpn do. linear and nearest methods return arrays including NaN, while other methods such as cubic or quadratic return all NaN arrays.

In [35]: da = xr.DataArray([0, 2, np.nan, 3, 3.25], dims="x", coords={"x": range(5)})

In [36]: da.interp(x=[0.5, 1.5, 2.5])
Out[36]: 
<xarray.DataArray (x: 3)> Size: 24B
array([ 1., nan, nan])
Coordinates:
  * x        (x) float64 24B 0.5 1.5 2.5

In [37]: da.interp(x=[0.5, 1.5, 2.5], method="cubic")
Out[37]: 
<xarray.DataArray (x: 3)> Size: 24B
array([nan, nan, nan])
Coordinates:
  * x        (x) float64 24B 0.5 1.5 2.5

To avoid this, you can drop NaN by dropna(), and then make the interpolation

In [38]: dropped = da.dropna("x")

In [39]: dropped
Out[39]: 
<xarray.DataArray (x: 4)> Size: 32B
array([0.  , 2.  , 3.  , 3.25])
Coordinates:
  * x        (x) int64 32B 0 1 3 4

In [40]: dropped.interp(x=[0.5, 1.5, 2.5], method="cubic")
Out[40]: 
<xarray.DataArray (x: 3)> Size: 24B
array([1.19 , 2.508, 2.93 ])
Coordinates:
  * x        (x) float64 24B 0.5 1.5 2.5

If NaNs are distributed randomly in your multidimensional array, dropping all the columns containing more than one NaNs by dropna() may lose a significant amount of information. In such a case, you can fill NaN by interpolate_na(), which is similar to pandas.Series.interpolate().

In [41]: filled = da.interpolate_na(dim="x")

In [42]: filled
Out[42]: 
<xarray.DataArray (x: 5)> Size: 40B
array([0.  , 2.  , 2.5 , 3.  , 3.25])
Coordinates:
  * x        (x) int64 40B 0 1 2 3 4

This fills NaN by interpolating along the specified dimension. After filling NaNs, you can interpolate:

In [43]: filled.interp(x=[0.5, 1.5, 2.5], method="cubic")
Out[43]: 
<xarray.DataArray (x: 3)> Size: 24B
array([1.309, 2.316, 2.738])
Coordinates:
  * x        (x) float64 24B 0.5 1.5 2.5

For the details of interpolate_na(), see Missing values.

Example#

Let’s see how interp() works on real data.

# Raw data
In [44]: ds = xr.tutorial.open_dataset("air_temperature").isel(time=0)

In [45]: fig, axes = plt.subplots(ncols=2, figsize=(10, 4))

In [46]: ds.air.plot(ax=axes[0])
Out[46]: <matplotlib.collections.QuadMesh at 0x7f0ae2a25ee0>

In [47]: axes[0].set_title("Raw data")
Out[47]: Text(0.5, 1.0, 'Raw data')

# Interpolated data
In [48]: new_lon = np.linspace(ds.lon[0].item(), ds.lon[-1].item(), ds.sizes["lon"] * 4)

In [49]: new_lat = np.linspace(ds.lat[0].item(), ds.lat[-1].item(), ds.sizes["lat"] * 4)

In [50]: dsi = ds.interp(lat=new_lat, lon=new_lon)

In [51]: dsi.air.plot(ax=axes[1])
Out[51]: <matplotlib.collections.QuadMesh at 0x7f0ad6af45c0>

In [52]: axes[1].set_title("Interpolated data")
Out[52]: Text(0.5, 1.0, 'Interpolated data')

Our advanced interpolation can be used to remap the data to the new coordinate. Consider the new coordinates x and z on the two dimensional plane. The remapping can be done as follows

# new coordinate
In [53]: x = np.linspace(240, 300, 100)

In [54]: z = np.linspace(20, 70, 100)

# relation between new and original coordinates
In [55]: lat = xr.DataArray(z, dims=["z"], coords={"z": z})

In [56]: lon = xr.DataArray(
   ....:     (x[:, np.newaxis] - 270) / np.cos(z * np.pi / 180) + 270,
   ....:     dims=["x", "z"],
   ....:     coords={"x": x, "z": z},
   ....: )
   ....: 

In [57]: fig, axes = plt.subplots(ncols=2, figsize=(10, 4))

In [58]: ds.air.plot(ax=axes[0])
Out[58]: <matplotlib.collections.QuadMesh at 0x7f0ad6b93230>

# draw the new coordinate on the original coordinates.
In [59]: for idx in [0, 33, 66, 99]:
   ....:     axes[0].plot(lon.isel(x=idx), lat, "--k")
   ....: 

In [60]: for idx in [0, 33, 66, 99]:
   ....:     axes[0].plot(*xr.broadcast(lon.isel(z=idx), lat.isel(z=idx)), "--k")
   ....: 

In [61]: axes[0].set_title("Raw data")
Out[61]: Text(0.5, 1.0, 'Raw data')

In [62]: dsi = ds.interp(lon=lon, lat=lat)

In [63]: dsi.air.plot(ax=axes[1])
Out[63]: <matplotlib.collections.QuadMesh at 0x7f0ad6b06cf0>

In [64]: axes[1].set_title("Remapped data")
Out[64]: Text(0.5, 1.0, 'Remapped data')