Grid Workflow#

class datacube.api.GridWorkflow(index, grid_spec=None, product=None)[source]#

GridWorkflow deals with cell- and tile-based processing using a grid defining a projection and resolution.

Use GridWorkflow to specify your desired output grid. The methods list_cells() and list_tiles() query the index and return a dictionary of cell or tile keys, each mapping to a Tile object.

The Tile object can then be used to load the data without needing the index, and can be serialized for use with the distributed package.

Create a grid workflow tool.

Either grid_spec or product must be supplied.

Parameters:

index (AbstractIndex) – The database index to use.
grid_spec (GridSpec | GridSpec | None) – The grid projection and resolution
product (Product | str | None) – The name of an existing product, if no grid_spec is supplied.

Members:

cell_observations(cell_index=None, geopolygon=None, tile_buffer=None, **indexers)[source]#

List datasets, grouped by cell.

Parameters:

geopolygon (Geometry | None) – Only return observations with data inside polygon.
tile_buffer (tuple[float, float] | None) – buffer tiles by (y, x) in CRS units
cell_index (tuple[int, int] | None) – The cell index. E.g. (14, -40)
indexers (str | float | int | Range | datetime | Iterable[str | Geometry] | Not) – Query to match the datasets, see datacube.api.query.Query

Return type:

dict[tuple[int, int], dict[str, list[Dataset] | GeoBox]]

Returns:

A dictionary of cell index (int, int) mapping to a dict containing two keys, “datasets”, with a list of datasets, and “geobox”, containing the geobox for the cell.

See also

datacube.Datacube.find_datasets()

datacube.api.query.Query

static group_into_cells(observations, group_by)[source]#

Group observations into a stack of source tiles.

Parameters:

observations – datasets grouped by cell index, like from cell_observations()
group_by (GroupBy) – grouping method, as returned by datacube.api.query.query_group_by()

Return type:

dict[tuple[int, int], Tile]

Returns:

tiles grouped by cell index

See also

load()

datacube.Datacube.group_datasets()

list_cells(cell_index=None, **query)[source]#

List cells that match the query.

Returns a dictionary of cell indexes to Tile objects.

Cells are included if they contain any datasets that match the query using the same format as datacube.Datacube.load().

E.g.:

gw.list_cells(
    product="ls5_nbar_albers",
    time=("2001-1-1 00:00:00", "2001-3-31 23:59:59"),
)

Parameters:

cell_index (tuple[int, int] | None) – The cell index. E.g. (14, -40)
query – see datacube.api.query.Query

Return type:

dict[tuple[int, int], Tile]

list_tiles(cell_index=None, **query)[source]#

List tiles of data, sorted by cell.

tiles = gw.list_tiles(
    product="ls5_nbar_albers",
    time=("2001-1-1 00:00:00", "2001-3-31 23:59:59"),
)

The values can be passed to load()

Parameters:

cell_index (tuple[int, int] | None) – The cell index (optional). E.g. (14, -40)
query – see datacube.api.query.Query

Return type:

dict[tuple[int, int, datetime64], Tile]

See also

load()

static load(tile, measurements=None, dask_chunks=None, fuse_func=None, resampling=None, skip_broken_datasets=False)[source]#

Load data for a cell/tile.

The data to be loaded is defined by the output of list_tiles().

This is a static function and does not use the index. This can be useful to minimize the number of database connections when running as a worker in a distributed environment.

See the documentation on using xr with dask for more information.

Parameters:

tile (Tile) – The tile to load.
measurements (Iterable[str] | None) – The names of measurements to load
dask_chunks (Mapping[str, int | Literal['auto']] | None) –
If the data should be loaded as needed using dask.array.Array, specify the chunk size in each output direction.

See the documentation on using xr with dask for more information.
fuse_func – Function to fuse together a tile that has been pre-grouped by calling list_cells() with a group_by parameter.
resampling (str | dict | None) –
The resampling method to use if re-projection is required, could be configured per band using a dictionary (:meth: load_data)

Valid values are: 'nearest', 'cubic', 'bilinear', 'cubic_spline', 'lanczos', 'average'

Defaults to 'nearest'.
skip_broken_datasets (bool) – If True, ignore broken datasets and continue processing with the data that can be loaded. If False, an exception will be raised on a broken dataset. Defaults to False.

Return type:

Dataset

See also

list_tiles() list_cells()

static tile_sources(observations, group_by)[source]#

Split observations into tiles and group into source tiles

Parameters:

observations – datasets grouped by cell index, like from cell_observations()
group_by (GroupBy) – grouping method, as returned by datacube.api.query.query_group_by()

Return type:

dict[tuple[int, int, datetime64], Tile]

Returns:

tiles grouped by cell index and time

See also

load()

datacube.Datacube.group_datasets()