Data Loading#

Datacube.load([product, measurements, ...])

Load data as an xarray.Dataset object.

Internal Loading Functions#

This operations can be useful if you need to customise the loading process, for example, to pre-filter the available datasets before loading.

Datacube.find_datasets(**search_terms)

Search the index and return all datasets for a product matching the search terms.

Datacube.group_datasets(datasets, group_by)

Group datasets along defined non-spatial dimensions (ie.

Datacube.load_data(sources, geobox, measurements)

Load data from group_datasets() into an xarray.Dataset.

Loading 3D Datasets#

Uses datacube.Datacube xarray.DataSet datacube.Datacube.load()

Below are some examples of loading a 3D dataset, using small test dataset supplied in tests/data/lbg/gedi/. Support for 3D datasets requires a 3D read driver and 3D product definition.

Loading all time slices of a 3D dataset

import datacube

dc = datacube.Datacube()

query = {
    "latitude": (-35.45, -35.25),
    "longitude": (149.0, 149.2),
    "output_crs": "EPSG:4326",
    "resolution": (0.00027778, -0.00027778),
}

dc.load(product='gedi_l2b_cover_z', **query)

Returns a 3D (+ time) Dataset with z coordinates in addition to latitude/longitude:

<xarray.Dataset>
Dimensions:      (latitude: 720, longitude: 721, time: 2, z: 30)
Coordinates:
* time         (time) datetime64[ns] 2019-08-16T09:28:51 2019-10-21T15:54:01
* latitude     (latitude) float64 -35.45 -35.45 -35.45 ... -35.25 -35.25
* longitude    (longitude) float64 149.2 149.2 149.2 ... 149.0 149.0 149.0
    spatial_ref  int32 4326
* z            (z) float64 5.0 10.0 15.0 20.0 25.0 ... 135.0 140.0 145.0 150.0
Data variables:
    cover_z      (time, z, latitude, longitude) float32 -9.999e+03 ... -9.999...
Attributes:
    crs:           EPSG:4326
    grid_mapping:  spatial_ref

Slice the dataset along the `z` dimension

dc.load(product='gedi_l2b_cover_z', z=(30, 50), **query)
<xarray.Dataset>
Dimensions:      (latitude: 720, longitude: 721, time: 2, z: 5)
Coordinates:
* time         (time) datetime64[ns] 2019-08-16T09:28:51 2019-10-21T15:54:01
* latitude     (latitude) float64 -35.45 -35.45 -35.45 ... -35.25 -35.25
* longitude    (longitude) float64 149.2 149.2 149.2 ... 149.0 149.0 149.0
    spatial_ref  int32 4326
* z            (z) float64 30.0 35.0 40.0 45.0 50.0
Data variables:
    cover_z      (time, z, latitude, longitude) float32 -9.999e+03 ... -9.999...
Attributes:
    crs:           EPSG:4326
    grid_mapping:  spatial_ref

Query the dataset at a single `z` coordinate

dc.load(product='gedi_l2b_cover_z', z=30, **query)
<xarray.Dataset>
Dimensions:      (latitude: 720, longitude: 721, time: 2, z: 1)
Coordinates:
* time         (time) datetime64[ns] 2019-08-16T09:28:51 2019-10-21T15:54:01
* latitude     (latitude) float64 -35.45 -35.45 -35.45 ... -35.25 -35.25
* longitude    (longitude) float64 149.2 149.2 149.2 ... 149.0 149.0 149.0
    spatial_ref  int32 4326
* z            (z) float64 30.0
Data variables:
    cover_z      (time, z, latitude, longitude) float32 -9.999e+03 ... -9.999...
Attributes:
    crs:           EPSG:4326
    grid_mapping:  spatial_ref

Use dask to chunk the dataset along the `z` dimension

dc.load(product='gedi_l2b_cover_z', dask_chunks={'z': 15}, **query)
<xarray.Dataset>
Dimensions:      (latitude: 720, longitude: 721, time: 2, z: 30)
Coordinates:
* time         (time) datetime64[ns] 2019-08-16T09:28:51 2019-10-21T15:54:01
* latitude     (latitude) float64 -35.45 -35.45 -35.45 ... -35.25 -35.25
* longitude    (longitude) float64 149.2 149.2 149.2 ... 149.0 149.0 149.0
    spatial_ref  int32 4326
* z            (z) float64 5.0 10.0 15.0 20.0 25.0 ... 135.0 140.0 145.0 150.0
Data variables:
    cover_z      (time, z, latitude, longitude) float32 dask.array<chunksize=(1, 15, 720, 721), meta=np.ndarray>
Attributes:
    crs:           EPSG:4326
    grid_mapping:  spatial_ref

Group by#

query_group_by([group_by])

Group by function for loading datasets

solar_day(dataset[, longitude])

Adjust Dataset timestamp for "local time" given location and convert to numpy.

GroupBy(group_by_func, dimension, units[, ...])

GroupBy Object