xarray.Dataset.to_zarr#
- Dataset.to_zarr(store=None, chunk_store=None, mode=None, synchronizer=None, group=None, encoding=None, *, compute=True, consolidated=None, append_dim=None, region=None, safe_chunks=True, storage_options=None, zarr_version=None, zarr_format=None, write_empty_chunks=None, chunkmanager_store_kwargs=None)[source]#
Write dataset contents to a zarr group.
Zarr chunks are determined in the following way:
From the
chunks
attribute in each variable’sencoding
(can be set via Dataset.chunk).If the variable is a Dask array, from the dask chunks
If neither Dask chunks nor encoding chunks are present, chunks will be determined automatically by Zarr
If both Dask chunks and encoding chunks are present, encoding chunks will be used, provided that there is a many-to-one relationship between encoding chunks and dask chunks (i.e. Dask chunks are bigger than and evenly divide encoding chunks); otherwise raise a
ValueError
. This restriction ensures that no synchronization / locks are required when writing. To disable this restriction, usesafe_chunks=False
.
- Parameters:
store (
MutableMapping
,str
or path-like, optional) – Store or path to directory in local or remote file system.chunk_store (
MutableMapping
,str
or path-like, optional) – Store or path to directory in local or remote file system only for Zarr array chunks. Requires zarr-python v2.4.0 or later.mode (
{"w", "w-", "a", "a-", r+", None}
, optional) – Persistence mode: “w” means create (overwrite if exists); “w-” means create (fail if exists); “a” means override all existing variables including dimension coordinates (create if does not exist); “a-” means only append those variables that haveappend_dim
. “r+” means modify existing array values only (raise an error if any metadata or shapes would change). The default mode is “a” ifappend_dim
is set. Otherwise, it is “r+” ifregion
is set andw-
otherwise.synchronizer (
object
, optional) – Zarr array synchronizer.group (
str
, optional) – Group path. (a.k.a. path in zarr terminology.)encoding (
dict
, optional) – Nested dictionary with variable names as keys and dictionaries of variable specific encodings as values, e.g.,{"my_variable": {"dtype": "int16", "scale_factor": 0.1,}, ...}
compute (
bool
, default:True
) – If True write array data immediately, otherwise return adask.delayed.Delayed
object that can be computed to write array data later. Metadata is always updated eagerly.consolidated (
bool
, optional) – If True, applyzarr.convenience.consolidate_metadata()
after writing metadata and read existing stores with consolidated metadata; if False, do not. The default (consolidated=None) means write consolidated metadata and attempt to read consolidated metadata for existing stores (falling back to non-consolidated).When the experimental
zarr_version=3
,consolidated
must be either beNone
orFalse
.append_dim (hashable, optional) – If set, the dimension along which the data will be appended. All other dimensions on overridden variables must remain the same size.
region (
dict
or"auto"
, optional) – Optional mapping from dimension names to either a)"auto"
, or b) integer slices, indicating the region of existing zarr array(s) in which to write this dataset’s data.If
"auto"
is provided the existing store will be opened and the region inferred by matching indexes."auto"
can be used as a single string, which will automatically infer the region for all dimensions, or as dictionary values for specific dimensions mixed together with explicit slices for other dimensions.Alternatively integer slices can be provided; for example,
{'x': slice(0, 1000), 'y': slice(10000, 11000)}
would indicate that values should be written to the region0:1000
alongx
and10000:11000
alongy
.Two restrictions apply to the use of
region
:If
region
is set, _all_ variables in a dataset must have at least one dimension in common with the region. Other variables should be written in a separate single call toto_zarr()
.Dimensions cannot be included in both
region
andappend_dim
at the same time. To create empty arrays to fill in withregion
, use a separate call toto_zarr()
withcompute=False
. See “Appending to existing Zarr stores” in the reference documentation for full details.
Users are expected to ensure that the specified region aligns with Zarr chunk boundaries, and that dask chunks are also aligned. Xarray makes limited checks that these multiple chunk boundaries line up. It is possible to write incomplete chunks and corrupt the data with this option if you are not careful.
safe_chunks (
bool
, default:True
) – If True, only allow writes to when there is a many-to-one relationship between Zarr chunks (specified in encoding) and Dask chunks. Set False to override this restriction; however, data may become corrupted if Zarr arrays are written in parallel. This option may be useful in combination withcompute=False
to initialize a Zarr from an existing Dataset with arbitrary chunk structure. In addition to the many-to-one relationship validation, it also detects partial chunks writes when using the region parameter, these partial chunks are considered unsafe in the mode “r+” but safe in the mode “a”. Note: Even with these validations it can still be unsafe to write two or more chunked arrays in the same location in parallel if they are not writing in independent regions, for those cases it is better to use a synchronizer.storage_options (
dict
, optional) – Any additional parameters for the storage backend (ignored for local paths).zarr_version (
int
orNone
, optional) – .. deprecated:: 2024.9.1 Usezarr_format
instead.zarr_format (
int
orNone
, optional) – The desired zarr format to target (currently 2 or 3). The default of None will attempt to determine the zarr version fromstore
when possible, otherwise defaulting to the default version used by the zarr-python library installed.write_empty_chunks (
bool
orNone
, optional) – If True, all chunks will be stored regardless of their contents. If False, each chunk is compared to the array’s fill value prior to storing. If a chunk is uniformly equal to the fill value, then that chunk is not be stored, and the store entry for that chunk’s key is deleted. This setting enables sparser storage, as only chunks with non-fill-value data are stored, at the expense of overhead associated with checking the data of each chunk. If None (default) fall back to specification(s) inencoding
or Zarr defaults. AValueError
will be raised if the value of this (if not None) differs withencoding
.chunkmanager_store_kwargs (
dict
, optional) – Additional keyword arguments passed on to the ChunkManager.store method used to store chunked arrays. For example for a dask array additional kwargs will be passed eventually todask.array.store()
. Experimental API that should not be relied upon.
- Returns:
* ``dask.delayed.Delayed`
if compute is False`* ZarrStore otherwise
References
Notes
- Zarr chunking behavior:
If chunks are found in the encoding argument or attribute corresponding to any DataArray, those chunks are used. If a DataArray is a dask array, it is written with those chunks. If not other chunks are found, Zarr uses its own heuristics to choose automatic chunk sizes.
- encoding:
The encoding attribute (if exists) of the DataArray(s) will be used. Override any existing encodings by providing the
encoding
kwarg.fill_value
handling:There exists a subtlety in interpreting zarr’s
fill_value
property. For zarr v2 format arrays,fill_value
is always interpreted as an invalid value similar to the_FillValue
attribute in CF/netCDF. For Zarr v3 format arrays, only an explicit_FillValue
attribute will be used to mask the data if requested usingmask_and_scale=True
. See this Github issue for more.
See also
- Zarr
The I/O user guide, with more details and examples.