xarray.open_mfdataset (2024)

Contents

xarray.open_mfdataset(paths, chunks=None, concat_dim=None, compat='no_conflicts', preprocess=None, engine=None, data_vars='all', coords='different', combine='by_coords', parallel=False, join='outer', attrs_file=None, combine_attrs='override', **kwargs)[source]#

Open multiple files as a single dataset.

If combine=’by_coords’ then the function combine_by_coords is used to combinethe datasets into one before returning the result, and if combine=’nested’ thencombine_nested is used. The filepaths must be structured according to whichcombining function is used, the details of which are given in the documentation forcombine_by_coords and combine_nested. By default combine='by_coords'will be used. Requires dask to be installed. See documentation fordetails on dask [1]. Global attributes from the attrs_file are usedfor the combined dataset.

Parameters:
  • paths (str or nested sequence of paths) – Either a string glob in the form "path/to/my/files/*.nc" or an explicit list offiles to open. Paths can be given as strings or as pathlib Paths. Ifconcatenation along more than one dimension is desired, then paths must be anested list-of-lists (see combine_nested for details). (A string glob willbe expanded to a 1-dimensional list.)

  • chunks (int, dict, 'auto' or None, optional) – Dictionary with keys given by dimension names and values given by chunk sizes.In general, these should divide the dimensions of each dataset. If int, chunkeach dimension by chunks. By default, chunks will be chosen to load entireinput files into memory at once. This has a major impact on performance: pleasesee the full documentation for more details [2].

  • concat_dim (str, DataArray, Index or a Sequence of these or None, optional) – Dimensions to concatenate files along. You only need to provide this argumentif combine='nested', and if any of the dimensions along which you want toconcatenate is not a dimension in the original datasets, e.g., if you want tostack a collection of 2D arrays along a third dimension. Setconcat_dim=[..., None, ...] explicitly to disable concatenation along aparticular dimension. Default is None, which for a 1D list of filepaths isequivalent to opening the files separately and then merging them withxarray.merge.

  • combine ({"by_coords", "nested"}, optional) – Whether xarray.combine_by_coords or xarray.combine_nested is used tocombine all the data. Default is to use xarray.combine_by_coords.

  • compat ({"identical", "equals", "broadcast_equals", "no_conflicts", "override"}, default: "no_conflicts") – String indicating how to compare variables of the same name forpotential conflicts when merging:

    • “broadcast_equals”: all values must be equal when variables arebroadcast against each other to ensure common dimensions.

    • “equals”: all values and dimensions must be the same.

    • “identical”: all values, dimensions and attributes must be thesame.

    • “no_conflicts”: only values which are not null in both datasetsmust be equal. The returned dataset then contains the combinationof all non-null values.

    • “override”: skip comparing and pick variable from first dataset

  • preprocess (callable(), optional) – If provided, call this function on each dataset prior to concatenation.You can find the file-name from which each dataset was loaded inds.encoding["source"].

  • engine ({"netcdf4", "scipy", "pydap", "h5netcdf", "zarr", None} , installed backend or subclass of xarray.backends.BackendEntrypoint, optional) – Engine to use when reading files. If not provided, the default engineis chosen based on available dependencies, with a preference for“netcdf4”.

  • data_vars ({"minimal", "different", "all"} or list of str, default: "all") –

    These data variables will be concatenated together:
    • “minimal”: Only data variables in which the dimension alreadyappears are included.

    • “different”: Data variables which are not equal (ignoringattributes) across all datasets are also concatenated (as well asall for which dimension already appears). Beware: this option mayload the data payload of data variables into memory if they are notalready loaded.

    • “all”: All data variables will be concatenated.

    • list of str: The listed data variables will be concatenated, inaddition to the “minimal” data variables.

  • coords ({"minimal", "different", "all"} or list of str, optional) –

    These coordinate variables will be concatenated together:
    • “minimal”: Only coordinates in which the dimension already appearsare included.

    • “different”: Coordinates which are not equal (ignoring attributes)across all datasets are also concatenated (as well as all for whichdimension already appears). Beware: this option may load the datapayload of coordinate variables into memory if they are not alreadyloaded.

    • “all”: All coordinate variables will be concatenated, exceptthose corresponding to other dimensions.

    • list of str: The listed coordinate variables will be concatenated,in addition the “minimal” coordinates.

  • parallel (bool, default: False) – If True, the open and preprocess steps of this function will beperformed in parallel using dask.delayed. Default is False.

  • join ({"outer", "inner", "left", "right", "exact", "override"}, default: "outer") – String indicating how to combine differing indexes(excluding concat_dim) in objects

    • “outer”: use the union of object indexes

    • “inner”: use the intersection of object indexes

    • “left”: use indexes from the first object with each dimension

    • “right”: use indexes from the last object with each dimension

    • “exact”: instead of aligning, raise ValueError when indexes to bealigned are not equal

    • “override”: if indexes are of same size, rewrite indexes to bethose of the first object with that dimension. Indexes for the samedimension must have the same size in all objects.

  • attrs_file (str or path-like, optional) – Path of the file used to read global attributes from.By default global attributes are read from the first file provided,with wildcard matches sorted by filename.

  • combine_attrs ({"drop", "identical", "no_conflicts", "drop_conflicts", "override"} or callable(), default: "override") – A callable or a string indicating how to combine attrs of the objects beingmerged:

    • “drop”: empty attrs on returned Dataset.

    • “identical”: all attrs must be the same on every object.

    • “no_conflicts”: attrs from all objects are combined, any that havethe same name must also have the same value.

    • “drop_conflicts”: attrs from all objects are combined, any that havethe same name but different values are dropped.

    • “override”: skip comparing and copy attrs from the first dataset tothe result.

    If a callable, it must expect a sequence of attrs dicts and a context objectas its only parameters.

  • **kwargs (optional) – Additional arguments passed on to xarray.open_dataset(). For anoverview of some of the possible options, see the documentation ofxarray.open_dataset()

Returns:

xarray.Dataset

Notes

open_mfdataset opens files with read-only access. When you modify valuesof a Dataset, even one linked to files on disk, only the in-memory copy youare manipulating in xarray is modified: the original file on disk is nevertouched.

See also

combine_by_coords, combine_nested, open_dataset

Examples

A user might want to pass additional arguments into preprocess whenapplying some operation to many individual files that are being opened. One routeto do this is through the use of functools.partial.

>>> from functools import partial>>> def _preprocess(x, lon_bnds, lat_bnds):...  return x.sel(lon=slice(*lon_bnds), lat=slice(*lat_bnds))...>>> lon_bnds, lat_bnds = (-110, -105), (40, 45)>>> partial_func = partial(_preprocess, lon_bnds=lon_bnds, lat_bnds=lat_bnds)>>> ds = xr.open_mfdataset(...  "file_*.nc", concat_dim="time", preprocess=partial_func... ) 

It is also possible to use any argument to open_dataset togetherwith open_mfdataset, such as for example drop_variables:

>>> ds = xr.open_mfdataset(...  "file.nc", drop_variables=["varname_1", "varname_2"] # any list of vars... ) 

References

Contents

xarray.open_mfdataset (2024)
Top Articles
Latest Posts
Article information

Author: Prof. An Powlowski

Last Updated:

Views: 6663

Rating: 4.3 / 5 (44 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Prof. An Powlowski

Birthday: 1992-09-29

Address: Apt. 994 8891 Orval Hill, Brittnyburgh, AZ 41023-0398

Phone: +26417467956738

Job: District Marketing Strategist

Hobby: Embroidery, Bodybuilding, Motor sports, Amateur radio, Wood carving, Whittling, Air sports

Introduction: My name is Prof. An Powlowski, I am a charming, helpful, attractive, good, graceful, thoughtful, vast person who loves writing and wants to share my knowledge and understanding with you.