datacube.Datacube.load#

Datacube.load(product=None, measurements=None, output_crs=None, resolution=None, resampling=None, align=None, skip_broken_datasets=None, dask_chunks=None, like=None, fuse_func=None, datasets=None, dataset_predicate=None, progress_cbk=None, patch_url=None, limit=None, driver=None, **query)[source]#

Load data as an xarray.Dataset object. Each measurement will be a data variable in the xarray.Dataset.

See the xarray documentation for usage of the xarray.Dataset and xarray.DataArray objects.

Product and Measurements

A product can be specified using the product name:

product='ls5_ndvi_albers'

See list_products() for the list of products with their names and properties.

A product name MUST be supplied unless search is bypassed all together by supplying an explicit list of datasets.

The measurements argument is a list of measurement names, as listed in list_measurements(). If not provided, all measurements for the product will be returned:

measurements=['red', 'nir', 'swir2']

Dimensions

Spatial dimensions can be specified using the longitude/latitude and x/y fields.

The CRS of this query is assumed to be WGS84/EPSG:4326 unless the crs field is supplied, even if the stored data is in another projection or the output_crs is specified. The dimensions longitude/latitude and x/y can be used interchangeably:

latitude=(-34.5, -35.2), longitude=(148.3, 148.7)

or:

x=(1516200, 1541300), y=(-3867375, -3867350), crs='EPSG:3577'

You can also specify a polygon with an arbitrary CRS (in e.g. the native CRS):

geopolygon=polygon(coords, crs="EPSG:3577")

Or an iterable of polygons (search is done against the union of all polygons):

geopolygon=[poly1, poly2, poly3, ....]

You can also pass a WKT string, or a GeoJSON string or any other object that can be passed to the odc.geo.Geometry constructor, or an iterable of any of the above.

Performance and accuracy of geopolygon queries may vary depending on the index driver in use and the CRS.

The time dimension can be specified using a single or tuple of datetime objects or strings with YYYY-MM-DD hh:mm:ss format. Data will be loaded inclusive of the start and finish times. A None value in the range indicates an open range, with the provided date serving as either the upper or lower bound. E.g.:

time=('2000-01-01', '2001-12-31')
time=('2000-01', '2001-12')
time=('2000', '2001')
time=('2000')
time=('2000', None)  # all data from 2000 onward
time=(None, '2000')  # all data up to and including 2000

For 3D datasets, where the product definition contains an extra_dimension specification, these dimensions can be queried using that dimension’s name. E.g.:

z=(10, 30)

or:

z=5

or:

wvl=(560.3, 820.5)

For EO-specific datasets that are based around scenes, the time dimension can be reduced to the day level, using solar day to keep scenes together:

group_by='solar_day'

For data that has different values for the scene overlap that requires more complex rules for combining data, a function can be provided to the merging into a single time slice.

See datacube.helpers.ga_pq_fuser() for an example implementation. see datacube.api.query.query_group_by() for group_by built-in functions.

Output

To reproject or resample data, supply the output_crs, resolution, resampling and align fields.

By default, the resampling method is ‘nearest’. However, any stored overview layers may be used when down-sampling, which may override (or hybridise) the choice of resampling method.

To reproject data to 30 m resolution for EPSG:3577:

dc.load(product='ls5_nbar_albers',
        x=(148.15, 148.2),
        y=(-35.15, -35.2),
        time=('1990', '1991'),
        output_crs='EPSG:3577`,
        resolution=30,
        resampling='cubic'
)

odc-geo style xy objects are preferred for passing in resolution and align pairs to avoid x/y ordering ambiguity.

Parameters:
  • product (str) – The name of the product to be loaded. Either product or datasets must be supplied

  • measurements (str | list[str] | None) –

    Measurements name or list of names to be included, as listed in list_measurements(). These will be loaded as individual xr.DataArray variables in the output xarray.Dataset object.

    If a list is specified, the measurements will be returned in the order requested. By default, all available measurements are included.

  • output_crs (str) –

    The CRS of the returned data, for example EPSG:3577. If no CRS is supplied, the CRS of the stored data is used if available.

    Any form that can be converted to a CRS by odc-geo is accepted.

    This differs from the crs parameter described above, which is used to define the CRS of the coordinates in the query itself.

  • resolution (int | float | tuple[int | float, int | float] | Resolution | None) –

    The spatial resolution of the returned data. If using square pixels with an inverted Y axis, it should be provided as an int or float. If not, it should be provided as an odc-geo XY object to avoid coordinate-order ambiguity. If passed as a tuple, y,x order is assumed for backwards compatibility.

    Units are in the coordinate space of output_crs. This includes the direction (as indicated by a positive or negative number).

  • resampling (Union[str, int, Resampling, dict[str, Union[str, int, Resampling]], None]) –

    The resampling method to use if re-projection is required. This could be a string or a dictionary mapping band name to resampling mode. When using a dict use '\*' to indicate “apply to all other bands”, for example {'\*': 'cubic', 'fmask': 'nearest'} would use cubic for all bands except fmask for which nearest will be used.

    Valid values are:

    'nearest', 'average', 'bilinear', 'cubic', 'cubic_spline',
    'lanczos', 'mode', 'gauss',  'max', 'min', 'med', 'q1', 'q3'
    

    Default is to use nearest for all bands.

    See also

    load_data()

  • align (Union[XY[float], Iterable[float], None]) –

    Load data such that point ‘align’ lies on the pixel boundary. A pair of floats between 0 and 1.

    An odc-geo XY object is preferred to avoid coordinate-order ambiguity. If passed as a tuple, x,y order is assumed for backwards compatibility.

    Default is (0, 0)

  • skip_broken_datasets (bool) – Optional. If this is True, then don’t break when failing to load a broken dataset. If None, the value will come from the environment variable of the same name. Default is False.

  • dask_chunks (dict) –

    If the data should be lazily loaded using dask.array.Array, specify the chunking size in each output dimension.

    See the documentation on using xarray with dask for more information.

  • like (xarray.Dataset) –

    Use the output of a previous load() to load data into the same spatial grid and resolution (i.e. odc.geo.geobox.GeoBox or an xarray Dataset or DataArray). E.g.:

    pq = dc.load(product='ls5_pq_albers', like=nbar_dataset)
    

  • fuse_func (Callable[[ndarray, ndarray], Any] | Mapping[str, Callable[[ndarray, ndarray], Any] | None] | None) – Function used to fuse/combine/reduce data with the group_by parameter. By default, data is simply copied over the top of each other in a relatively undefined manner. This function can perform a specific combining step. This can be a dictionary if different fusers are needed per band (similar format to the resampling dict described above).

  • datasets (Sequence[Dataset] | None) – Optional. If this is a non-empty list of datacube.model.Dataset objects, these will be loaded instead of performing a database lookup.

  • dataset_predicate (Callable[[Dataset], bool] | None) –

    Optional. A function that can be passed to restrict loaded datasets. A predicate function should take a datacube.model.Dataset object (e.g. as returned from find_datasets()) and return a boolean.

    For example, loaded data could be filtered to January observations only by passing the following predicate function that returns True for datasets acquired in January:

    def filter_jan(dataset): return dataset.time.begin.month == 1
    

    .

  • progress_cbk (Callable[[int, int], Any] | None) – Int, Int -> None, if supplied will be called for every file read with files_processed_so_far, total_files. This is only applicable to non-lazy loads, ignored when using dask.

  • patch_url (Callable[[str], str] | None) – if supplied, will be used to patch/sign the url(s), as required to access some commercial archives (e.g. Microsoft Planetary Computer).

  • limit (int | None) – Optional. If provided, limit the maximum number of datasets returned. Useful for testing and debugging. Can also be provided via the dc_load_limit config option.

  • driver (Any | None) – Optional. If provided, use the specified driver to load the data.

  • query (str | float | int | Range | datetime | Not) – Search parameters for products and dimension ranges as described above. For example: 'x', 'y', 'time', 'crs'.

Returns:

Requested data in a xarray.Dataset

Return type:

xarray.Dataset