Quickstart¶
Here are some quick examples of what you can do with xray’s DataArray object. Everything is explained in much more detail in the rest of the documentation.
To begin, import numpy, pandas and xray:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: import xray
Create a DataArray¶
For more details, see Data Structures.
From scratch¶
In [4]: xray.DataArray(np.random.randn(2, 3))
Out[4]:
<xray.DataArray (dim_0: 2, dim_1: 3)>
array([[-0.86184896, -2.10456922, -0.49492927],
[ 1.07180381, 0.72155516, -0.70677113]])
Coordinates:
* dim_0 (dim_0) int64 0 1
* dim_1 (dim_1) int64 0 1 2
In [5]: xray.DataArray(np.random.randn(2, 3), dims=['x', 'y'])
Out[5]:
<xray.DataArray (x: 2, y: 3)>
array([[-1.03957499, 0.27185989, -0.42497233],
[ 0.56702035, 0.27623202, -1.08740069]])
Coordinates:
* x (x) int64 0 1
* y (y) int64 0 1 2
In [6]: xray.DataArray(np.random.randn(2, 3), [('x', ['a', 'b']), ('y', [-2, 0, 2])])
Out[6]:
<xray.DataArray (x: 2, y: 3)>
array([[-0.67368971, 0.11364841, -1.47842655],
[ 0.52498767, 0.40470522, 0.57704599]])
Coordinates:
* x (x) |S1 'a' 'b'
* y (y) int64 -2 0 2
From pandas¶
In [7]: df = pd.DataFrame(np.random.randn(2, 3), index=['a', 'b'], columns=[-2, 0, 2])
In [8]: df.index.name = 'x'
In [9]: df.columns.name = 'y'
In [10]: df
Out[10]:
y -2 0 2
x
a -1.715002 -1.039268 -0.370647
b -1.157892 -1.344312 0.844885
In [11]: foo = xray.DataArray(df, name='foo')
In [12]: foo
Out[12]:
<xray.DataArray 'foo' (x: 2, y: 3)>
array([[-1.71500202, -1.03926848, -0.37064686],
[-1.15789225, -1.34431181, 0.84488514]])
Coordinates:
* x (x) object 'a' 'b'
* y (y) int64 -2 0 2
Properties¶
In [13]: foo.values
Out[13]:
array([[-1.71500202, -1.03926848, -0.37064686],
[-1.15789225, -1.34431181, 0.84488514]])
In [14]: foo.dims
Out[14]: ('x', 'y')
In [15]: foo.coords['y']
Out[15]:
<xray.DataArray 'y' (y: 3)>
array([-2, 0, 2])
Coordinates:
* y (y) int64 -2 0 2
In [16]: foo.attrs
Out[16]: OrderedDict()
Indexing¶
For more details, see Indexing and selecting data.
Like numpy¶
In [17]: foo[[0, 1], 0]
Out[17]:
<xray.DataArray 'foo' (x: 2)>
array([-1.71500202, -1.15789225])
Coordinates:
y int64 -2
* x (x) object 'a' 'b'
Like pandas¶
In [18]: foo.loc['a':'b', -2]
Out[18]:
<xray.DataArray 'foo' (x: 2)>
array([-1.71500202, -1.15789225])
Coordinates:
y int64 -2
* x (x) object 'a' 'b'
By dimension name and integer label¶
In [19]: foo.isel(x=slice(2))
Out[19]:
<xray.DataArray 'foo' (x: 2, y: 3)>
array([[-1.71500202, -1.03926848, -0.37064686],
[-1.15789225, -1.34431181, 0.84488514]])
Coordinates:
* y (y) int64 -2 0 2
* x (x) object 'a' 'b'
By dimension name and coordinate label¶
In [20]: foo.sel(x=['a', 'b'])
Out[20]:
<xray.DataArray 'foo' (x: 2, y: 3)>
array([[-1.71500202, -1.03926848, -0.37064686],
[-1.15789225, -1.34431181, 0.84488514]])
Coordinates:
* y (y) int64 -2 0 2
* x (x) object 'a' 'b'
Computation¶
For more details, see Computation.
Unary operations¶
In [21]: foo.sum()
Out[21]:
<xray.DataArray 'foo' ()>
array(-4.782236279814638)
In [22]: foo.mean(dim=['x'])
Out[22]:
<xray.DataArray 'foo' (y: 3)>
array([-1.43644713, -1.19179015, 0.23711914])
Coordinates:
* y (y) int64 -2 0 2
In [23]: foo + 10
Out[23]:
<xray.DataArray 'foo' (x: 2, y: 3)>
array([[ 8.28499798, 8.96073152, 9.62935314],
[ 8.84210775, 8.65568819, 10.84488514]])
Coordinates:
* y (y) int64 -2 0 2
* x (x) object 'a' 'b'
In [24]: np.sin(10)
Out[24]: -0.54402111088936977
In [25]: foo.T
Out[25]:
<xray.DataArray 'foo' (y: 3, x: 2)>
array([[-1.71500202, -1.15789225],
[-1.03926848, -1.34431181],
[-0.37064686, 0.84488514]])
Coordinates:
* y (y) int64 -2 0 2
* x (x) object 'a' 'b'
Binary operations¶
In [26]: bar = xray.DataArray(np.random.randn(3), [foo.coords['y']])
In [27]: zzz = xray.DataArray(np.random.randn(4), dims='z')
In [28]: bar
Out[28]:
<xray.DataArray (y: 3)>
array([ 1.07576978, -0.10904998, 1.64356307])
Coordinates:
* y (y) int64 -2 0 2
In [29]: zzz
Out[29]:
<xray.DataArray (z: 4)>
array([-1.46938796, 0.35702056, -0.6746001 , -1.77690372])
Coordinates:
* z (z) int64 0 1 2 3
In [30]: bar + zzz
Out[30]:
<xray.DataArray (y: 3, z: 4)>
array([[-0.39361818, 1.43279035, 0.40116968, -0.70113393],
[-1.57843793, 0.24797059, -0.78365008, -1.88595369],
[ 0.17417511, 2.00058363, 0.96896297, -0.13334065]])
Coordinates:
* y (y) int64 -2 0 2
* z (z) int64 0 1 2 3
In [31]: foo / bar
Out[31]:
<xray.DataArray (x: 2, y: 3)>
array([[ -1.59420913, 9.53020375, -0.22551423],
[ -1.07633833, 12.32748388, 0.51405702]])
Coordinates:
* y (y) int64 -2 0 2
* x (x) object 'a' 'b'
GroupBy¶
For more details, see GroupBy: split-apply-combine.
In [32]: labels = xray.DataArray(['E', 'F', 'E'], [foo.coords['y']], name='labels')
In [33]: labels
Out[33]:
<xray.DataArray 'labels' (y: 3)>
array(['E', 'F', 'E'],
dtype='|S1')
Coordinates:
* y (y) int64 -2 0 2
In [34]: foo.groupby(labels).mean('y')
Out[34]:
<xray.DataArray 'foo' (x: 2, labels: 2)>
array([[-1.04282444, -1.03926848],
[-0.15650355, -1.34431181]])
Coordinates:
* x (x) object 'a' 'b'
* labels (labels) |S1 'E' 'F'
In [35]: foo.groupby(labels).apply(lambda x: x.max() - x.min())
Out[35]:
<xray.DataArray 'foo' (labels: 2)>
array([ 2.55988716, 0.30504333])
Coordinates:
* labels (labels) |S1 'E' 'F'
Convert to pandas¶
For more details, see Working with pandas.
In [36]: foo.to_dataframe()
Out[36]:
foo
x y
a -2 -1.715002
0 -1.039268
2 -0.370647
b -2 -1.157892
0 -1.344312
2 0.844885
In [37]: foo.to_series()
Out[37]:
x y
a -2 -1.715002
0 -1.039268
2 -0.370647
b -2 -1.157892
0 -1.344312
2 0.844885
Name: foo, dtype: float64