xarray.open_zarr

Contents

xarray.open_zarr#

xarray.open_zarr(store, group=None, synchronizer=None, chunks='auto', decode_cf=True, mask_and_scale=True, decode_times=True, concat_characters=True, decode_coords=True, drop_variables=None, consolidated=None, overwrite_encoded_chunks=False, chunk_store=None, storage_options=None, decode_timedelta=None, use_cftime=None, zarr_version=None, zarr_format=None, use_zarr_fill_value_as_mask=None, chunked_array_type=None, from_array_kwargs=None, **kwargs)[source]#

Load and decode a dataset from a Zarr store.

The store object should be a valid store for a Zarr group. store variables must contain dimension metadata encoded in the _ARRAY_DIMENSIONS attribute or must have NCZarr format.

Parameters
  • store (MutableMapping or str) – A MutableMapping where a Zarr Group has been stored or a path to a directory in file system where a Zarr DirectoryStore has been stored.

  • synchronizer (object, optional) – Array synchronizer provided to zarr

  • group (str, optional) – Group path. (a.k.a. path in zarr terminology.)

  • chunks (int, dict, 'auto' or None, default: 'auto') – If provided, used to load the data into dask arrays.

    • chunks='auto' will use dask auto chunking taking into account the engine preferred chunks.

    • chunks=None skips using dask, which is generally faster for small arrays.

    • chunks=-1 loads the data with dask using a single chunk for all arrays.

    • chunks={} loads the data with dask using engine preferred chunks if exposed by the backend, otherwise with a single chunk for all arrays.

    See dask chunking for more details.

  • overwrite_encoded_chunks (bool, optional) – Whether to drop the zarr chunks encoded for each variable when a dataset is loaded with specified chunk sizes (default: False)

  • decode_cf (bool, optional) – Whether to decode these variables, assuming they were saved according to CF conventions.

  • mask_and_scale (bool, optional) – If True, replace array values equal to _FillValue with NA and scale values according to the formula original_values * scale_factor + add_offset, where _FillValue, scale_factor and add_offset are taken from variable attributes (if they exist). If the _FillValue or missing_value attribute contains multiple values a warning will be issued and all array values matching one of the multiple values will be replaced by NA.

  • decode_times (bool, optional) – If True, decode times encoded in the standard NetCDF datetime format into datetime objects. Otherwise, leave them encoded as numbers.

  • concat_characters (bool, optional) – If True, concatenate along the last dimension of character arrays to form string arrays. Dimensions will only be concatenated over (and removed) if they have no corresponding variable and if they are only used as the last dimension of character arrays.

  • decode_coords (bool, optional) – If True, decode the ‘coordinates’ attribute to identify coordinates in the resulting dataset.

  • drop_variables (str or iterable, optional) – A variable or list of variables to exclude from being parsed from the dataset. This may be useful to drop variables with problems or inconsistent values.

  • consolidated (bool, optional) – Whether to open the store using zarr’s consolidated metadata capability. Only works for stores that have already been consolidated. By default (consolidate=None), attempts to read consolidated metadata, falling back to read non-consolidated metadata if that fails.

    When the experimental zarr_version=3, consolidated must be either be None or False.

  • chunk_store (MutableMapping, optional) – A separate Zarr store only for chunk data.

  • storage_options (dict, optional) – Any additional parameters for the storage backend (ignored for local paths).

  • decode_timedelta (bool, optional) – If True, decode variables and coordinates with time units in {‘days’, ‘hours’, ‘minutes’, ‘seconds’, ‘milliseconds’, ‘microseconds’} into timedelta objects. If False, leave them encoded as numbers. If None (default), assume the same value of decode_time.

  • use_cftime (bool, optional) – Only relevant if encoded dates come from a standard calendar (e.g. “gregorian”, “proleptic_gregorian”, “standard”, or not specified). If None (default), attempt to decode times to np.datetime64[ns] objects; if this is not possible, decode times to cftime.datetime objects. If True, always decode times to cftime.datetime objects, regardless of whether or not they can be represented using np.datetime64[ns] objects. If False, always decode times to np.datetime64[ns] objects; if this is not possible raise an error.

  • zarr_version (int or None, optional) –

    Deprecated since version 2024.9.1: Use zarr_format instead.

  • zarr_format (int or None, optional) – The desired zarr format to target (currently 2 or 3). The default of None will attempt to determine the zarr version from store when possible, otherwise defaulting to the default version used by the zarr-python library installed.

  • use_zarr_fill_value_as_mask (bool, optional) – If True, use the zarr Array fill_value to mask the data, the same as done for NetCDF data with _FillValue or missing_value attributes. If False, the fill_value is ignored and the data are not masked. If None, this defaults to True for zarr_version=2 and False for zarr_version=3.

  • chunked_array_type (str, optional) – Which chunked array type to coerce this datasets’ arrays to. Defaults to ‘dask’ if installed, else whatever is registered via the ChunkManagerEntryPoint system. Experimental API that should not be relied upon.

  • from_array_kwargs (dict, optional) – Additional keyword arguments passed on to the ChunkManagerEntrypoint.from_array method used to create chunked arrays, via whichever chunk manager is specified through the chunked_array_type kwarg. Defaults to {'manager': 'dask'}, meaning additional kwargs will be passed eventually to dask.array.from_array(). Experimental API that should not be relied upon.

Returns

dataset (Dataset) – The newly created dataset.

References

http://zarr.readthedocs.io/