Contents
- xarray.open_mfdataset(paths, chunks=None, concat_dim=None, compat='no_conflicts', preprocess=None, engine=None, data_vars='all', coords='different', combine='by_coords', parallel=False, join='outer', attrs_file=None, combine_attrs='override', **kwargs)[source]#
Open multiple files as a single dataset.
If combine=’by_coords’ then the function
combine_by_coords
is used to combinethe datasets into one before returning the result, and if combine=’nested’ thencombine_nested
is used. The filepaths must be structured according to whichcombining function is used, the details of which are given in the documentation forcombine_by_coords
andcombine_nested
. By defaultcombine='by_coords'
will be used. Requires dask to be installed. See documentation fordetails on dask [1]. Global attributes from theattrs_file
are usedfor the combined dataset.- Parameters:
paths (
str
ornested sequence
ofpaths
) – Either a string glob in the form"path/to/my/files/*.nc"
or an explicit list offiles to open. Paths can be given as strings or as pathlib Paths. Ifconcatenation along more than one dimension is desired, thenpaths
must be anested list-of-lists (seecombine_nested
for details). (A string glob willbe expanded to a 1-dimensional list.)chunks (
int
,dict
,'auto'
orNone
, optional) – Dictionary with keys given by dimension names and values given by chunk sizes.In general, these should divide the dimensions of each dataset. If int, chunkeach dimension bychunks
. By default, chunks will be chosen to load entireinput files into memory at once. This has a major impact on performance: pleasesee the full documentation for more details [2].concat_dim (
str
, DataArray,Index
ora Sequence
ofthese
orNone
, optional) – Dimensions to concatenate files along. You only need to provide this argumentifcombine='nested'
, and if any of the dimensions along which you want toconcatenate is not a dimension in the original datasets, e.g., if you want tostack a collection of 2D arrays along a third dimension. Setconcat_dim=[..., None, ...]
explicitly to disable concatenation along aparticular dimension. Default is None, which for a 1D list of filepaths isequivalent to opening the files separately and then merging them withxarray.merge
.combine (
{"by_coords", "nested"}
, optional) – Whetherxarray.combine_by_coords
orxarray.combine_nested
is used tocombine all the data. Default is to usexarray.combine_by_coords
.compat (
{"identical", "equals", "broadcast_equals", "no_conflicts", "override"}
, default:"no_conflicts"
) – String indicating how to compare variables of the same name forpotential conflicts when merging:“broadcast_equals”: all values must be equal when variables arebroadcast against each other to ensure common dimensions.
“equals”: all values and dimensions must be the same.
“identical”: all values, dimensions and attributes must be thesame.
“no_conflicts”: only values which are not null in both datasetsmust be equal. The returned dataset then contains the combinationof all non-null values.
“override”: skip comparing and pick variable from first dataset
preprocess (
callable()
, optional) – If provided, call this function on each dataset prior to concatenation.You can find the file-name from which each dataset was loaded inds.encoding["source"]
.engine (
{"netcdf4", "scipy", "pydap", "h5netcdf", "zarr", None}
, installed backend orsubclass
of xarray.backends.BackendEntrypoint, optional) – Engine to use when reading files. If not provided, the default engineis chosen based on available dependencies, with a preference for“netcdf4”.data_vars (
{"minimal", "different", "all"}
orlist
ofstr
, default:"all"
) –- These data variables will be concatenated together:
“minimal”: Only data variables in which the dimension alreadyappears are included.
“different”: Data variables which are not equal (ignoringattributes) across all datasets are also concatenated (as well asall for which dimension already appears). Beware: this option mayload the data payload of data variables into memory if they are notalready loaded.
“all”: All data variables will be concatenated.
list of str: The listed data variables will be concatenated, inaddition to the “minimal” data variables.
coords (
{"minimal", "different", "all"}
orlist
ofstr
, optional) –- These coordinate variables will be concatenated together:
“minimal”: Only coordinates in which the dimension already appearsare included.
“different”: Coordinates which are not equal (ignoring attributes)across all datasets are also concatenated (as well as all for whichdimension already appears). Beware: this option may load the datapayload of coordinate variables into memory if they are not alreadyloaded.
“all”: All coordinate variables will be concatenated, exceptthose corresponding to other dimensions.
list of str: The listed coordinate variables will be concatenated,in addition the “minimal” coordinates.
parallel (
bool
, default:False
) – If True, the open and preprocess steps of this function will beperformed in parallel usingdask.delayed
. Default is False.join (
{"outer", "inner", "left", "right", "exact", "override"}
, default:"outer"
) – String indicating how to combine differing indexes(excluding concat_dim) in objects“outer”: use the union of object indexes
“inner”: use the intersection of object indexes
“left”: use indexes from the first object with each dimension
“right”: use indexes from the last object with each dimension
“exact”: instead of aligning, raise ValueError when indexes to bealigned are not equal
“override”: if indexes are of same size, rewrite indexes to bethose of the first object with that dimension. Indexes for the samedimension must have the same size in all objects.
attrs_file (
str
or path-like, optional) – Path of the file used to read global attributes from.By default global attributes are read from the first file provided,with wildcard matches sorted by filename.combine_attrs (
{"drop", "identical", "no_conflicts", "drop_conflicts", "override"}
orcallable()
, default:"override"
) – A callable or a string indicating how to combine attrs of the objects beingmerged:“drop”: empty attrs on returned Dataset.
“identical”: all attrs must be the same on every object.
“no_conflicts”: attrs from all objects are combined, any that havethe same name must also have the same value.
“drop_conflicts”: attrs from all objects are combined, any that havethe same name but different values are dropped.
“override”: skip comparing and copy attrs from the first dataset tothe result.
If a callable, it must expect a sequence of
attrs
dicts and a context objectas its only parameters.**kwargs (optional) – Additional arguments passed on to xarray.open_dataset(). For anoverview of some of the possible options, see the documentation ofxarray.open_dataset()
- Returns:
xarray.Dataset
Notes
open_mfdataset
opens files with read-only access. When you modify valuesof a Dataset, even one linked to files on disk, only the in-memory copy youare manipulating in xarray is modified: the original file on disk is nevertouched.See also
combine_by_coords, combine_nested, open_dataset
Examples
A user might want to pass additional arguments into
preprocess
whenapplying some operation to many individual files that are being opened. One routeto do this is through the use offunctools.partial
.>>> from functools import partial>>> def _preprocess(x, lon_bnds, lat_bnds):... return x.sel(lon=slice(*lon_bnds), lat=slice(*lat_bnds))...>>> lon_bnds, lat_bnds = (-110, -105), (40, 45)>>> partial_func = partial(_preprocess, lon_bnds=lon_bnds, lat_bnds=lat_bnds)>>> ds = xr.open_mfdataset(... "file_*.nc", concat_dim="time", preprocess=partial_func... )
It is also possible to use any argument to
open_dataset
togetherwithopen_mfdataset
, such as for exampledrop_variables
:>>> ds = xr.open_mfdataset(... "file.nc", drop_variables=["varname_1", "varname_2"] # any list of vars... )
References
Contents