xarray.open_groups

Contents

xarray.open_groups#

xarray.open_groups(filename_or_obj, *, engine=None, chunks=None, cache=None, decode_cf=None, mask_and_scale=None, decode_times=None, decode_timedelta=None, use_cftime=None, concat_characters=None, decode_coords=None, drop_variables=None, inline_array=False, chunked_array_type=None, from_array_kwargs=None, backend_kwargs=None, **kwargs)[source]#

Open and decode a file or file-like object, creating a dictionary containing one xarray Dataset for each group in the file.

Useful for an HDF file (“netcdf4” or “h5netcdf”) containing many groups that are not alignable with their parents and cannot be opened directly with open_datatree. It is encouraged to use this function to inspect your data, then make the necessary changes to make the structure coercible to a DataTree object before calling DataTree.from_dict() and proceeding with your analysis.

Parameters

filename_or_obj (str, Path, file-like, or DataStore) – Strings and Path objects are interpreted as a path to a netCDF file.

Parameters
  • filename_or_obj (str, Path, file-like, or DataStore) – Strings and Path objects are interpreted as a path to a netCDF file or Zarr store.

  • engine ({"netcdf4", "h5netcdf", "zarr", None}, installed backend or xarray.backends.BackendEntrypoint, optional) – Engine to use when reading files. If not provided, the default engine is chosen based on available dependencies, with a preference for “netcdf4”. A custom backend class (a subclass of BackendEntrypoint) can also be used.

  • chunks (int, dict, 'auto' or None, default: None) – If provided, used to load the data into dask arrays.

    • chunks="auto" will use dask auto chunking taking into account the engine preferred chunks.

    • chunks=None skips using dask, which is generally faster for small arrays.

    • chunks=-1 loads the data with dask using a single chunk for all arrays.

    • chunks={} loads the data with dask using the engine’s preferred chunk size, generally identical to the format’s chunk size. If not available, a single chunk for all arrays.

    See dask chunking for more details.

  • cache (bool, optional) – If True, cache data loaded from the underlying datastore in memory as NumPy arrays when accessed to avoid reading from the underlying data- store multiple times. Defaults to True unless you specify the chunks argument to use dask, in which case it defaults to False. Does not change the behavior of coordinates corresponding to dimensions, which always load their data from disk into a pandas.Index.

  • decode_cf (bool, optional) – Whether to decode these variables, assuming they were saved according to CF conventions.

  • mask_and_scale (bool or dict-like, optional) – If True, replace array values equal to _FillValue with NA and scale values according to the formula original_values * scale_factor + add_offset, where _FillValue, scale_factor and add_offset are taken from variable attributes (if they exist). If the _FillValue or missing_value attribute contains multiple values a warning will be issued and all array values matching one of the multiple values will be replaced by NA. Pass a mapping, e.g. {"my_variable": False}, to toggle this feature per-variable individually. This keyword may not be supported by all the backends.

  • decode_times (bool or dict-like, optional) – If True, decode times encoded in the standard NetCDF datetime format into datetime objects. Otherwise, leave them encoded as numbers. Pass a mapping, e.g. {"my_variable": False}, to toggle this feature per-variable individually. This keyword may not be supported by all the backends.

  • decode_timedelta (bool or dict-like, optional) – If True, decode variables and coordinates with time units in {“days”, “hours”, “minutes”, “seconds”, “milliseconds”, “microseconds”} into timedelta objects. If False, leave them encoded as numbers. If None (default), assume the same value of decode_time. Pass a mapping, e.g. {"my_variable": False}, to toggle this feature per-variable individually. This keyword may not be supported by all the backends.

  • use_cftime (bool or dict-like, optional) – Only relevant if encoded dates come from a standard calendar (e.g. “gregorian”, “proleptic_gregorian”, “standard”, or not specified). If None (default), attempt to decode times to np.datetime64[ns] objects; if this is not possible, decode times to cftime.datetime objects. If True, always decode times to cftime.datetime objects, regardless of whether or not they can be represented using np.datetime64[ns] objects. If False, always decode times to np.datetime64[ns] objects; if this is not possible raise an error. Pass a mapping, e.g. {"my_variable": False}, to toggle this feature per-variable individually. This keyword may not be supported by all the backends.

  • concat_characters (bool or dict-like, optional) – If True, concatenate along the last dimension of character arrays to form string arrays. Dimensions will only be concatenated over (and removed) if they have no corresponding variable and if they are only used as the last dimension of character arrays. Pass a mapping, e.g. {"my_variable": False}, to toggle this feature per-variable individually. This keyword may not be supported by all the backends.

  • decode_coords (bool or {"coordinates", "all"}, optional) – Controls which variables are set as coordinate variables:

    • “coordinates” or True: Set variables referred to in the 'coordinates' attribute of the datasets or individual variables as coordinate variables.

    • “all”: Set variables referred to in 'grid_mapping', 'bounds' and other attributes as coordinate variables.

    Only existing variables can be set as coordinates. Missing variables will be silently ignored.

  • drop_variables (str or iterable of str, optional) – A variable or list of variables to exclude from being parsed from the dataset. This may be useful to drop variables with problems or inconsistent values.

  • inline_array (bool, default: False) – How to include the array in the dask task graph. By default(inline_array=False) the array is included in a task by itself, and each chunk refers to that task by its key. With inline_array=True, Dask will instead inline the array directly in the values of the task graph. See dask.array.from_array().

  • chunked_array_type (str, optional) – Which chunked array type to coerce this datasets’ arrays to. Defaults to ‘dask’ if installed, else whatever is registered via the ChunkManagerEnetryPoint system. Experimental API that should not be relied upon.

  • from_array_kwargs (dict) – Additional keyword arguments passed on to the ChunkManagerEntrypoint.from_array method used to create chunked arrays, via whichever chunk manager is specified through the chunked_array_type kwarg. For example if dask.array.Array() objects are used for chunking, additional kwargs will be passed to dask.array.from_array(). Experimental API that should not be relied upon.

  • backend_kwargs (dict) – Additional keyword arguments passed on to the engine open function, equivalent to **kwargs.

  • **kwargs (dict) – Additional keyword arguments passed on to the engine open function. For example:

    • ‘group’: path to the group in the given file to open as the root group as a str.

    • ‘lock’: resource lock to use when reading data from disk. Only relevant when using dask or another form of parallelism. By default, appropriate locks are chosen to safely read and write files with the currently active dask scheduler. Supported by “netcdf4”, “h5netcdf”, “scipy”.

    See engine open function for kwargs accepted by each specific engine.

Returns

groups (dict of str to xarray.Dataset) – The groups as Dataset objects

Notes

open_groups opens the file with read-only access. When you modify values of a Dataset, even one linked to files on disk, only the in-memory copy you are manipulating in xarray is modified: the original file on disk is never touched.