xarray.open_mfdataset¶
-
xarray.
open_mfdataset
(paths, chunks=None, concat_dim='__infer_concat_dim__', compat='no_conflicts', preprocess=None, engine=None, lock=None, data_vars='all', coords='different', autoclose=False, parallel=False, **kwargs)¶ Open multiple files as a single dataset.
Requires dask to be installed. See documentation for details on dask [1]. Attributes from the first dataset file are used for the combined dataset.
Parameters: - paths : str or sequence
Either a string glob in the form “path/to/my/files/*.nc” or an explicit list of files to open. Paths can be given as strings or as pathlib Paths.
- chunks : int or dict, optional
Dictionary with keys given by dimension names and values given by chunk sizes. In general, these should divide the dimensions of each dataset. If int, chunk each dimension by
chunks
. By default, chunks will be chosen to load entire input files into memory at once. This has a major impact on performance: please see the full documentation for more details [2].- concat_dim : None, str, DataArray or Index, optional
Dimension to concatenate files along. This argument is passed on to
xarray.auto_combine()
along with the dataset objects. You only need to provide this argument if the dimension along which you want to concatenate is not a dimension in the original datasets, e.g., if you want to stack a collection of 2D arrays along a third dimension. By default, xarray attempts to infer this argument by examining component files. Setconcat_dim=None
explicitly to disable concatenation.- compat : {‘identical’, ‘equals’, ‘broadcast_equals’,
‘no_conflicts’}, optional
String indicating how to compare variables of the same name for potential conflicts when merging:
- ‘broadcast_equals’: all values must be equal when variables are broadcast against each other to ensure common dimensions.
- ‘equals’: all values and dimensions must be the same.
- ‘identical’: all values, dimensions and attributes must be the same.
- ‘no_conflicts’: only values which are not null in both datasets must be equal. The returned dataset then contains the combination of all non-null values.
- preprocess : callable, optional
If provided, call this function on each dataset prior to concatenation.
- engine : {‘netcdf4’, ‘scipy’, ‘pydap’, ‘h5netcdf’, ‘pynio’}, optional
Engine to use when reading files. If not provided, the default engine is chosen based on available dependencies, with a preference for ‘netcdf4’.
- autoclose : bool, optional
If True, automatically close files to avoid OS Error of too many files being open. However, this option doesn’t work with streams, e.g., BytesIO.
- lock : False, True or threading.Lock, optional
This argument is passed on to
dask.array.from_array()
. By default, a per-variable lock is used when reading data from netCDF files with the netcdf4 and h5netcdf engines to avoid issues with concurrent access when using dask’s multithreaded backend.- data_vars : {‘minimal’, ‘different’, ‘all’ or list of str}, optional
- These data variables will be concatenated together:
- ‘minimal’: Only data variables in which the dimension already appears are included.
- ‘different’: Data variables which are not equal (ignoring attributes) across all datasets are also concatenated (as well as all for which dimension already appears). Beware: this option may load the data payload of data variables into memory if they are not already loaded.
- ‘all’: All data variables will be concatenated.
- list of str: The listed data variables will be concatenated, in addition to the ‘minimal’ data variables.
- coords : {‘minimal’, ‘different’, ‘all’ o list of str}, optional
- These coordinate variables will be concatenated together:
- ‘minimal’: Only coordinates in which the dimension already appears are included.
- ‘different’: Coordinates which are not equal (ignoring attributes) across all datasets are also concatenated (as well as all for which dimension already appears). Beware: this option may load the data payload of coordinate variables into memory if they are not already loaded.
- ‘all’: All coordinate variables will be concatenated, except those corresponding to other dimensions.
- list of str: The listed coordinate variables will be concatenated, in addition the ‘minimal’ coordinates.
- parallel : bool, optional
If True, the open and preprocess steps of this function will be performed in parallel using
dask.delayed
. Default is False.- **kwargs : optional
Additional arguments passed on to
xarray.open_dataset()
.
Returns: - xarray.Dataset
See also
References
[1] http://xarray.pydata.org/en/stable/dask.html [2] http://xarray.pydata.org/en/stable/dask.html#chunking-and-performance