Quickstart

Here are some quick examples of what you can do with xray’s DataArray object. Everything is explained in much more detail in the rest of the documentation.

To begin, import numpy, pandas and xray:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: import xray

Create a DataArray

For more details, see Data Structures.

From scratch

In [4]: xray.DataArray(np.random.randn(2, 3))
Out[4]: 
<xray.DataArray (dim_0: 2, dim_1: 3)>
array([[-0.86184896, -2.10456922, -0.49492927],
       [ 1.07180381,  0.72155516, -0.70677113]])
Coordinates:
  * dim_0    (dim_0) int64 0 1
  * dim_1    (dim_1) int64 0 1 2

In [5]: xray.DataArray(np.random.randn(2, 3), dims=['x', 'y'])
Out[5]: 
<xray.DataArray (x: 2, y: 3)>
array([[-1.03957499,  0.27185989, -0.42497233],
       [ 0.56702035,  0.27623202, -1.08740069]])
Coordinates:
  * x        (x) int64 0 1
  * y        (y) int64 0 1 2

In [6]: xray.DataArray(np.random.randn(2, 3), [('x', ['a', 'b']), ('y', [-2, 0, 2])])
Out[6]: 
<xray.DataArray (x: 2, y: 3)>
array([[-0.67368971,  0.11364841, -1.47842655],
       [ 0.52498767,  0.40470522,  0.57704599]])
Coordinates:
  * x        (x) |S1 'a' 'b'
  * y        (y) int64 -2 0 2

From pandas

In [7]: df = pd.DataFrame(np.random.randn(2, 3), index=['a', 'b'], columns=[-2, 0, 2])

In [8]: df.index.name = 'x'

In [9]: df.columns.name = 'y'

In [10]: df
Out[10]: 
y        -2         0         2
x                              
a -1.715002 -1.039268 -0.370647
b -1.157892 -1.344312  0.844885

In [11]: foo = xray.DataArray(df, name='foo')

In [12]: foo
Out[12]: 
<xray.DataArray 'foo' (x: 2, y: 3)>
array([[-1.71500202, -1.03926848, -0.37064686],
       [-1.15789225, -1.34431181,  0.84488514]])
Coordinates:
  * x        (x) object 'a' 'b'
  * y        (y) int64 -2 0 2

Properties

In [13]: foo.values
Out[13]: 
array([[-1.71500202, -1.03926848, -0.37064686],
       [-1.15789225, -1.34431181,  0.84488514]])

In [14]: foo.dims
Out[14]: ('x', 'y')

In [15]: foo.coords['y']
Out[15]: 
<xray.DataArray 'y' (y: 3)>
array([-2,  0,  2])
Coordinates:
  * y        (y) int64 -2 0 2

In [16]: foo.attrs
Out[16]: OrderedDict()

Indexing

For more details, see Indexing and selecting data.

Like numpy

In [17]: foo[[0, 1], 0]
Out[17]: 
<xray.DataArray 'foo' (x: 2)>
array([-1.71500202, -1.15789225])
Coordinates:
    y        int64 -2
  * x        (x) object 'a' 'b'

Like pandas

In [18]: foo.loc['a':'b', -2]
Out[18]: 
<xray.DataArray 'foo' (x: 2)>
array([-1.71500202, -1.15789225])
Coordinates:
    y        int64 -2
  * x        (x) object 'a' 'b'

By dimension name and integer label

In [19]: foo.isel(x=slice(2))
Out[19]: 
<xray.DataArray 'foo' (x: 2, y: 3)>
array([[-1.71500202, -1.03926848, -0.37064686],
       [-1.15789225, -1.34431181,  0.84488514]])
Coordinates:
  * y        (y) int64 -2 0 2
  * x        (x) object 'a' 'b'

By dimension name and coordinate label

In [20]: foo.sel(x=['a', 'b'])
Out[20]: 
<xray.DataArray 'foo' (x: 2, y: 3)>
array([[-1.71500202, -1.03926848, -0.37064686],
       [-1.15789225, -1.34431181,  0.84488514]])
Coordinates:
  * y        (y) int64 -2 0 2
  * x        (x) object 'a' 'b'

Computation

For more details, see Computation.

Unary operations

In [21]: foo.sum()
Out[21]: 
<xray.DataArray 'foo' ()>
array(-4.782236279814638)

In [22]: foo.mean(dim=['x'])
Out[22]: 
<xray.DataArray 'foo' (y: 3)>
array([-1.43644713, -1.19179015,  0.23711914])
Coordinates:
  * y        (y) int64 -2 0 2

In [23]: foo + 10
Out[23]: 
<xray.DataArray 'foo' (x: 2, y: 3)>
array([[  8.28499798,   8.96073152,   9.62935314],
       [  8.84210775,   8.65568819,  10.84488514]])
Coordinates:
  * y        (y) int64 -2 0 2
  * x        (x) object 'a' 'b'

In [24]: np.sin(10)
Out[24]: -0.54402111088936977

In [25]: foo.T
Out[25]: 
<xray.DataArray 'foo' (y: 3, x: 2)>
array([[-1.71500202, -1.15789225],
       [-1.03926848, -1.34431181],
       [-0.37064686,  0.84488514]])
Coordinates:
  * y        (y) int64 -2 0 2
  * x        (x) object 'a' 'b'

Binary operations

In [26]: bar = xray.DataArray(np.random.randn(3), [foo.coords['y']])

In [27]: zzz = xray.DataArray(np.random.randn(4), dims='z')

In [28]: bar
Out[28]: 
<xray.DataArray (y: 3)>
array([ 1.07576978, -0.10904998,  1.64356307])
Coordinates:
  * y        (y) int64 -2 0 2

In [29]: zzz
Out[29]: 
<xray.DataArray (z: 4)>
array([-1.46938796,  0.35702056, -0.6746001 , -1.77690372])
Coordinates:
  * z        (z) int64 0 1 2 3

In [30]: bar + zzz
Out[30]: 
<xray.DataArray (y: 3, z: 4)>
array([[-0.39361818,  1.43279035,  0.40116968, -0.70113393],
       [-1.57843793,  0.24797059, -0.78365008, -1.88595369],
       [ 0.17417511,  2.00058363,  0.96896297, -0.13334065]])
Coordinates:
  * y        (y) int64 -2 0 2
  * z        (z) int64 0 1 2 3

In [31]: foo / bar
Out[31]: 
<xray.DataArray (x: 2, y: 3)>
array([[ -1.59420913,   9.53020375,  -0.22551423],
       [ -1.07633833,  12.32748388,   0.51405702]])
Coordinates:
  * y        (y) int64 -2 0 2
  * x        (x) object 'a' 'b'

GroupBy

For more details, see GroupBy: split-apply-combine.

In [32]: labels = xray.DataArray(['E', 'F', 'E'], [foo.coords['y']], name='labels')

In [33]: labels
Out[33]: 
<xray.DataArray 'labels' (y: 3)>
array(['E', 'F', 'E'], 
      dtype='|S1')
Coordinates:
  * y        (y) int64 -2 0 2

In [34]: foo.groupby(labels).mean('y')
Out[34]: 
<xray.DataArray 'foo' (x: 2, labels: 2)>
array([[-1.04282444, -1.03926848],
       [-0.15650355, -1.34431181]])
Coordinates:
  * x        (x) object 'a' 'b'
  * labels   (labels) |S1 'E' 'F'

In [35]: foo.groupby(labels).apply(lambda x: x.max() - x.min())
Out[35]: 
<xray.DataArray 'foo' (labels: 2)>
array([ 2.55988716,  0.30504333])
Coordinates:
  * labels   (labels) |S1 'E' 'F'

Convert to pandas

For more details, see Working with pandas.

In [36]: foo.to_dataframe()
Out[36]: 
           foo
x y           
a -2 -1.715002
   0 -1.039268
   2 -0.370647
b -2 -1.157892
   0 -1.344312
   2  0.844885

In [37]: foo.to_series()
Out[37]: 
x  y 
a  -2   -1.715002
    0   -1.039268
    2   -0.370647
b  -2   -1.157892
    0   -1.344312
    2    0.844885
Name: foo, dtype: float64