Quick overview¶

Here are some quick examples of what you can do with xray.DataArray objects. Everything is explained in much more detail in the rest of the documentation.

To begin, import numpy, pandas and xray:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: import xray

Create a DataArray¶

You can make a DataArray from scratch by supplying data in the form of a numpy array or list, with optional dimensions and coordinates:

In [4]: xray.DataArray(np.random.randn(2, 3))
Out[4]: 
<xray.DataArray (dim_0: 2, dim_1: 3)>
array([[-1.344,  0.845,  1.076],
       [-0.109,  1.644, -1.469]])
Coordinates:
  * dim_0    (dim_0) int64 0 1
  * dim_1    (dim_1) int64 0 1 2

In [5]: data = xray.DataArray(np.random.randn(2, 3), [('x', ['a', 'b']), ('y', [-2, 0, 2])])

In [6]: data
Out[6]: 
<xray.DataArray (x: 2, y: 3)>
array([[ 0.357, -0.675, -1.777],
       [-0.969, -1.295,  0.414]])
Coordinates:
  * x        (x) |S1 'a' 'b'
  * y        (y) int64 -2 0 2

If you supply a pandas Series or DataFrame, metadata is copied directly:

In [7]: xray.DataArray(pd.Series(range(3), index=list('abc'), name='foo'))
Out[7]: 
<xray.DataArray 'foo' (dim_0: 3)>
array([0, 1, 2])
Coordinates:
  * dim_0    (dim_0) object 'a' 'b' 'c'

Here are the key properties for a DataArray:

# like in pandas, values is a numpy array that you can modify in-place
In [8]: data.values
Out[8]: 
array([[ 0.357, -0.675, -1.777],
       [-0.969, -1.295,  0.414]])

In [9]: data.dims
Out[9]: ('x', 'y')

In [10]: data.coords
Out[10]: 
Coordinates:
  * x        (x) |S1 'a' 'b'
  * y        (y) int64 -2 0 2

# you can use this dictionary to store arbitrary metadata
In [11]: data.attrs
Out[11]: OrderedDict()

Indexing¶

xray supports four kind of indexing. These operations are just as fast as in pandas, because we borrow pandas’ indexing machinery.

# positional and by integer label, like numpy
In [12]: data[[0, 1]]
Out[12]: 
<xray.DataArray (x: 2, y: 3)>
array([[ 0.357, -0.675, -1.777],
       [-0.969, -1.295,  0.414]])
Coordinates:
  * x        (x) |S1 'a' 'b'
  * y        (y) int64 -2 0 2

# positional and by coordinate label, like pandas
In [13]: data.loc['a':'b']
Out[13]: 
<xray.DataArray (x: 2, y: 3)>
array([[ 0.357, -0.675, -1.777],
       [-0.969, -1.295,  0.414]])
Coordinates:
  * x        (x) |S1 'a' 'b'
  * y        (y) int64 -2 0 2

# by dimension name and integer label
In [14]: data.isel(x=slice(2))
Out[14]: 
<xray.DataArray (x: 2, y: 3)>
array([[ 0.357, -0.675, -1.777],
       [-0.969, -1.295,  0.414]])
Coordinates:
  * x        (x) |S1 'a' 'b'
  * y        (y) int64 -2 0 2

# by dimension name and coordinate label
In [15]: data.sel(x=['a', 'b'])
Out[15]: 
<xray.DataArray (x: 2, y: 3)>
array([[ 0.357, -0.675, -1.777],
       [-0.969, -1.295,  0.414]])
Coordinates:
  * x        (x) |S1 'a' 'b'
  * y        (y) int64 -2 0 2

Computation¶

Data arrays work very similarly to numpy ndarrays:

In [16]: data + 10
Out[16]: 
<xray.DataArray (x: 2, y: 3)>
array([[ 10.357,   9.325,   8.223],
       [  9.031,   8.705,  10.414]])
Coordinates:
  * y        (y) int64 -2 0 2
  * x        (x) |S1 'a' 'b'

In [17]: np.sin(data)
Out[17]: 
<xray.DataArray (x: 2, y: 3)>
array([[ 0.349, -0.625, -0.979],
       [-0.824, -0.962,  0.402]])
Coordinates:
  * y        (y) int64 -2 0 2
  * x        (x) |S1 'a' 'b'

In [18]: data.T
Out[18]: 
<xray.DataArray (y: 3, x: 2)>
array([[ 0.357, -0.969],
       [-0.675, -1.295],
       [-1.777,  0.414]])
Coordinates:
  * x        (x) |S1 'a' 'b'
  * y        (y) int64 -2 0 2

In [19]: data.sum()
Out[19]: 
<xray.DataArray ()>
array(-3.9441825539138033)

However, aggregation operations can use dimension names instead of axis numbers:

In [20]: data.mean(dim='x')
Out[20]: 
<xray.DataArray (y: 3)>
array([-0.306, -0.985, -0.682])
Coordinates:
  * y        (y) int64 -2 0 2

Arithmetic operations broadcast based on dimension name. This means you don’t need to insert dummy dimensions for alignment:

In [21]: a = xray.DataArray(np.random.randn(3), [data.coords['y']])

In [22]: b = xray.DataArray(np.random.randn(4), dims='z')

In [23]: a
Out[23]: 
<xray.DataArray (y: 3)>
array([ 0.277, -0.472, -0.014])
Coordinates:
  * y        (y) int64 -2 0 2

In [24]: b
Out[24]: 
<xray.DataArray (z: 4)>
array([-0.363, -0.006, -0.923,  0.896])
Coordinates:
  * z        (z) int64 0 1 2 3

In [25]: a + b
Out[25]: 
<xray.DataArray (y: 3, z: 4)>
array([[-0.086,  0.271, -0.646,  1.172],
       [-0.835, -0.478, -1.395,  0.424],
       [-0.377, -0.02 , -0.937,  0.882]])
Coordinates:
  * y        (y) int64 -2 0 2
  * z        (z) int64 0 1 2 3

It also means that in most cases you do not need to worry about the order of dimensions:

In [26]: data - data.T
Out[26]: 
<xray.DataArray (x: 2, y: 3)>
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
Coordinates:
  * y        (y) int64 -2 0 2
  * x        (x) |S1 'a' 'b'

Operations also align based on index labels:

In [27]: data[:-1] - data[:1]
Out[27]: 
<xray.DataArray (x: 1, y: 3)>
array([[ 0.,  0.,  0.]])
Coordinates:
  * y        (y) int64 -2 0 2
  * x        (x) |S1 'a'

GroupBy¶

xray supports grouped operations using a very similar API to pandas:

In [28]: labels = xray.DataArray(['E', 'F', 'E'], [data.coords['y']], name='labels')

In [29]: labels
Out[29]: 
<xray.DataArray 'labels' (y: 3)>
array(['E', 'F', 'E'], 
      dtype='|S1')
Coordinates:
  * y        (y) int64 -2 0 2

In [30]: data.groupby(labels).mean('y')
Out[30]: 
<xray.DataArray (x: 2, labels: 2)>
array([[-0.71 , -0.675],
       [-0.278, -1.295]])
Coordinates:
  * x        (x) |S1 'a' 'b'
  * labels   (labels) object 'E' 'F'

In [31]: data.groupby(labels).apply(lambda x: x - x.min())
Out[31]: 
<xray.DataArray (x: 2, y: 3)>
array([[ 2.134,  0.62 ,  0.   ],
       [ 0.808,  0.   ,  2.191]])
Coordinates:
  * x        (x) |S1 'a' 'b'
  * y        (y) int64 -2 0 2

Convert to pandas¶

A key feature of xray is robust conversion to and from pandas objects:

In [32]: data.to_series()
Out[32]: 
x  y 
a  -2    0.357021
    0   -0.674600
    2   -1.776904
b  -2   -0.968914
    0   -1.294524
    2    0.413738
dtype: float64

In [33]: data.to_pandas()
Out[33]: 
y        -2         0         2
x                              
a  0.357021 -0.674600 -1.776904
b -0.968914 -1.294524  0.413738

Datasets and NetCDF¶

xray.Dataset is a dict-like container of DataArray objects that share index labels and dimensions. It looks a lot like a netCDF file:

In [34]: ds = data.to_dataset()

In [35]: ds
Out[35]: 
<xray.Dataset>
Dimensions:  (x: 2, y: 3)
Coordinates:
  * x        (x) |S1 'a' 'b'
  * y        (y) int64 -2 0 2
Data variables:
    None     (x, y) float64 0.357 -0.6746 -1.777 -0.9689 -1.295 0.4137

You can do almost everything you can do with DataArray objects with Dataset objects if you prefer to work with multiple variables at once.

Datasets also let you easily read and write netCDF files:

In [36]: ds.to_netcdf('example.nc')

In [37]: xray.open_dataset('example.nc')
Out[37]: 
<xray.Dataset>
Dimensions:  (x: 2, y: 3)
Coordinates:
  * y        (y) >i4 -2 0 2
  * x        (x) |S1 'a' 'b'
Data variables:
    None     (x, y) >f4 0.357021 -0.6746 -1.7769 -0.968914 -1.29452 0.413738