🍾 Xarray is now 10 years old! 🎉

xarray.DataArray.to_zarr

xarray.DataArray.to_zarr#

DataArray.to_zarr(store=None, chunk_store=None, mode=None, synchronizer=None, group=None, encoding=None, *, compute=True, consolidated=None, append_dim=None, region=None, safe_chunks=True, storage_options=None, zarr_version=None)[source]#

Write DataArray contents to a Zarr store

Zarr chunks are determined in the following way:

  • From the chunks attribute in each variable’s encoding (can be set via DataArray.chunk).

  • If the variable is a Dask array, from the dask chunks

  • If neither Dask chunks nor encoding chunks are present, chunks will be determined automatically by Zarr

  • If both Dask chunks and encoding chunks are present, encoding chunks will be used, provided that there is a many-to-one relationship between encoding chunks and dask chunks (i.e. Dask chunks are bigger than and evenly divide encoding chunks); otherwise raise a ValueError. This restriction ensures that no synchronization / locks are required when writing. To disable this restriction, use safe_chunks=False.

Parameters:
  • store (MutableMapping, str or path-like, optional) – Store or path to directory in local or remote file system.

  • chunk_store (MutableMapping, str or path-like, optional) – Store or path to directory in local or remote file system only for Zarr array chunks. Requires zarr-python v2.4.0 or later.

  • mode ({"w", "w-", "a", "a-", r+", None}, optional) – Persistence mode: “w” means create (overwrite if exists); “w-” means create (fail if exists); “a” means override all existing variables including dimension coordinates (create if does not exist); “a-” means only append those variables that have append_dim. “r+” means modify existing array values only (raise an error if any metadata or shapes would change). The default mode is “a” if append_dim is set. Otherwise, it is “r+” if region is set and w- otherwise.

  • synchronizer (object, optional) – Zarr array synchronizer.

  • group (str, optional) – Group path. (a.k.a. path in zarr terminology.)

  • encoding (dict, optional) – Nested dictionary with variable names as keys and dictionaries of variable specific encodings as values, e.g., {"my_variable": {"dtype": "int16", "scale_factor": 0.1,}, ...}

  • compute (bool, default: True) – If True write array data immediately, otherwise return a dask.delayed.Delayed object that can be computed to write array data later. Metadata is always updated eagerly.

  • consolidated (bool, optional) – If True, apply zarr’s consolidate_metadata function to the store after writing metadata and read existing stores with consolidated metadata; if False, do not. The default (consolidated=None) means write consolidated metadata and attempt to read consolidated metadata for existing stores (falling back to non-consolidated).

    When the experimental zarr_version=3, consolidated must be either be None or False.

  • append_dim (hashable, optional) – If set, the dimension along which the data will be appended. All other dimensions on overridden variables must remain the same size.

  • region (dict, optional) – Optional mapping from dimension names to integer slices along dataarray dimensions to indicate the region of existing zarr array(s) in which to write this datarray’s data. For example, {'x': slice(0, 1000), 'y': slice(10000, 11000)} would indicate that values should be written to the region 0:1000 along x and 10000:11000 along y.

    Two restrictions apply to the use of region:

    • If region is set, _all_ variables in a dataarray must have at least one dimension in common with the region. Other variables should be written in a separate call to to_zarr().

    • Dimensions cannot be included in both region and append_dim at the same time. To create empty arrays to fill in with region, use a separate call to to_zarr() with compute=False. See “Appending to existing Zarr stores” in the reference documentation for full details.

  • safe_chunks (bool, default: True) – If True, only allow writes to when there is a many-to-one relationship between Zarr chunks (specified in encoding) and Dask chunks. Set False to override this restriction; however, data may become corrupted if Zarr arrays are written in parallel. This option may be useful in combination with compute=False to initialize a Zarr store from an existing DataArray with arbitrary chunk structure.

  • storage_options (dict, optional) – Any additional parameters for the storage backend (ignored for local paths).

  • zarr_version (int or None, optional) – The desired zarr spec version to target (currently 2 or 3). The default of None will attempt to determine the zarr version from store when possible, otherwise defaulting to 2.

Returns:

  • * ``dask.delayed.Delayed` if compute is False`

  • * ZarrStore otherwise

References

https://zarr.readthedocs.io/

Notes

Zarr chunking behavior:

If chunks are found in the encoding argument or attribute corresponding to any DataArray, those chunks are used. If a DataArray is a dask array, it is written with those chunks. If not other chunks are found, Zarr uses its own heuristics to choose automatic chunk sizes.

encoding:

The encoding attribute (if exists) of the DataArray(s) will be used. Override any existing encodings by providing the encoding kwarg.

See also

Dataset.to_zarr

Zarr

The I/O user guide, with more details and examples.