Virtual Products#

Introduction#

Virtual products enable ODC users to define a recipe that can load data from multiple products and perform on-the-fly computation while the data is loaded. By declaring the entire recipe up front, Datacube can perform optimisations like not loading data that will never be used, and optimising memory usage.

An example virtual product would be a cloud-free surface reflectance (SR) product derived from a base surface reflectance product and a pixel quality (PQ) product that classifies cloud. Virtual products are especially useful when the datasets from the different products have the same spatio-temporal extent and the operations are to be applied pixel-by-pixel.

The source code for virtual products is in the datacube.virtual module. An example notebook using virtual products can be found in the DEA notebooks collection.

Using virtual products#

Virtual products provide an interface to query and load data. The methods are:

query(dc, **search_terms)
Retrieves datasets that match search_terms from the database index of dc.

group(datasets, **group_settings)
Groups datasets from query by timestamp. Does not connect to the database.

fetch(grouped, **load_settings)
Loads the data from the grouped datasets according to load_settings. Does not connect to the database. The on-the-fly transformations are applied at this stage. To load data lazily using dask, specify dask_chunks in the load_settings.

Note

Virtual products do provide a load(dc, **query) method similar to dc.load. However, this method is only to facilitate code migration, and its use is not recommended. It implements the pipeline:

For advanced use cases, the intermediate objects VirtualDatasetBag and VirtualDatasetBox may be directly manipulated.

Design#

Virtual products are constructed by applying a set of combinators to either existing datacube products or other virtual products. That is, a virtual product can be viewed as a tree whose nodes are combinators and leaves are ordinary datacube products.

Continuing the example in the previous section, consider the recipe for a cloud-free surface reflectance product from SR products for two sensors (ls7_nbar_albers and ls8_nbar_albers) and their corresponding PQ products (ls7_pq_albers and ls8_pq_albers):

:caption: Sample Virtual Product Recipe

collate:
  - transform: apply_mask
    mask_measurement_name: pixelquality
    input:
        juxtapose:
          - product: ls7_nbar_albers
            measurements: [red, green, blue]
          - transform: make_mask
            input:
                product: ls7_pq_albers
            flags:
                blue_saturated: false
                cloud_acca: no_cloud
                cloud_fmask: no_cloud
                cloud_shadow_acca: no_cloud_shadow
                cloud_shadow_fmask: no_cloud_shadow
                contiguous: true
                green_saturated: false
                nir_saturated: false
                red_saturated: false
                swir1_saturated: false
                swir2_saturated: false
            mask_measurement_name: pixelquality
  - transform: apply_mask
    mask_measurement_name: pixelquality
    input:
        juxtapose:
          - product: ls8_nbar_albers
            measurements: [red, green, blue]
          - transform: make_mask
            input:
                product: ls8_pq_albers
            flags:
                blue_saturated: false
                cloud_acca: no_cloud
                cloud_fmask: no_cloud
                cloud_shadow_acca: no_cloud_shadow
                cloud_shadow_fmask: no_cloud_shadow
                contiguous: true
                green_saturated: false
                nir_saturated: false
                red_saturated: false
                swir1_saturated: false
                swir2_saturated: false
            mask_measurement_name: pixelquality

from datacube.virtual import construct_from_yaml

cloud_free_ls_nbar = construct_from_yaml(recipe)

The virtual product cloud_free_ls_nbar can now be used to load cloud-free surface reflectance imagery. The dataflow for loading the data reflects the tree structure of the recipe:

Components#

Product (Input)#

The recipe to construct a virtual product from an existing datacube product has the form:

{'product': <product-name>, **settings}

where settings can include datacube.Datacube.load() settings such as:

measurements
output_crs, resolution, align
resampling
group_by, fuse_func

The product nodes are at the leaves of the virtual product syntax tree.

Collate (Combining)#

This combinator concatenates observations from multiple sensors having the same set of measurements. The recipe for a collate node has the form:

{'collate': [<virtual-product-1>,
             <virtual-product-2>,
             ...,
             <virtual-product-N>]}

Observations from different sensors get interlaced:

Optionally, the source product of a pixel can be captured by introducing another measurement in the loaded data that consists of the index of the source product:

{'collate': [<virtual-product-1>,
             <virtual-product-2>,
             ...,
             <virtual-product-N>],
 'index_measurement_name': <measurement-name>}

Juxtapose (Combining)#

This node merges disjoint sets of measurements from different products into one. The form of the recipe is:

{'juxtapose': [<virtual-product-1>,
               <virtual-product-2>,
               ...,
               <virtual-product-N>]}

Observations without corresponding entries in the other products will get dropped.

Reproject (Data modifying)#

Reproject performs an on-the-fly reprojection of raster data to a given CRS and resolution.

This is useful when combining different datasets into a common data grid, especially when calculating summary statistics.

The recipe looks like:

{'reproject': {'output_crs': <crs-string>,
               'resolution': [<y-resolution>, <x-resolution>],
               'align': [<y-alignment>, <x-alignment>]},
 'input':  <input-virtual-product>,
 'resampling': <resampling-settings>}

Here align and resampling are optional (defaults to edge-aligned and nearest neighbor respectively). This combinator enables performing transformations to raster data in their native grids before aligning different rasters to a common grid.

Transform (Data modifying)#

This node applies an on-the-fly data transformation on the loaded data. The recipe for a transform has the form:

{'transform': <transformation-class>,
 'input': <input-virtual-product>,
 **settings}

where the settings are keyword arguments to a class implementing datacube.virtual.Transformation:

class Transformation:
    def __init__(self, **settings):
        """ Initialize the transformation object with the given settings. """

    def compute(self, data):
        """ xarray.Dataset -> xarray.Dataset """

    def measurements(self, input_measurements):
        """ Dict[str, Measurement] -> Dict[str, Measurement] """

See Built in Transforms for the available built in Transforms.

Custom Transforms can also be written, see User-defined transformations.

Aggregate (Summary statistics)#

Aggregate performs (partial) temporal reduction, that is, statistics, on the loaded data. The form of the recipe is:

{'aggregate': <transformation-class>,
 'group_by': <grouping-function>,
 'input': <input-virtual-product>,
 **settings}

As in transform, the settings are keyword arguments to initialise the Transformation class. The grouping function takes the timestamp of the input dataset and returns another timestamp to be assigned to the group it would belong to. Common grouping functions (year, month, week, day) are built-in.

ODC provides one built in Statistic class, which is xarray_reduction. It applies a reducing method of the xarray.DataArray object to each individual band. Custom aggregate transformations are defined as in User-defined transformations.

Built in Transforms#

Expressions#

class datacube.virtual.transformations.Expressions(output, masked=True)[source]#

Calculate measurements on-the-fly using arithmetic expressions.

Alias in recipe: expressions. For example,

transform: expressions
output:
    ndvi:
        formula: (nir - red) / (nir + red)

input:
    product: example_surface_reflectance_product
    measurements: [nir, red]

Initialize transformation.

Parameters

output –
A dictionary mapping output measurement names to specifications. That specification can be one of:
- a measurement name from the input product in which case it is copied over
- a dictionary containing a formula, and optionally a dtype, a new nodata value, and a units specification
masked – Defaults to True. If set to False, the inputs and outputs are not masked for no data.

Methods:

`compute`(data)	Perform computation on data that results in an xarray.Dataset having measurements reported by the measurements method.
`measurements`(input_measurements)	Returns the dictionary describing the output measurements from this transformation.

compute(data)[source]#: Perform computation on data that results in an xarray.Dataset having measurements reported by the measurements method.

measurements(input_measurements)[source]#: Returns the dictionary describing the output measurements from this transformation. Assumes the data provided to compute will have measurements given by the dictionary input_measurements.

Make mask#

class datacube.virtual.transformations.MakeMask(mask_measurement_name, flags)[source]#

Create a mask that would only keep pixels for which the measurement with mask_measurement_name of the product satisfies flags.

Alias in recipe: make_mask.

Parameters

mask_measurement_name – the name of the measurement to create the mask from
flags – definition of the flags for the mask

Methods:

`compute`(data)	Perform computation on data that results in an xarray.Dataset having measurements reported by the measurements method.
`measurements`(input_measurements)	Returns the dictionary describing the output measurements from this transformation.

compute(data)[source]#: Perform computation on data that results in an xarray.Dataset having measurements reported by the measurements method.

measurements(input_measurements)[source]#: Returns the dictionary describing the output measurements from this transformation. Assumes the data provided to compute will have measurements given by the dictionary input_measurements.

Apply mask#

class datacube.virtual.transformations.ApplyMask(mask_measurement_name, apply_to=None, preserve_dtype=True, fallback_dtype='float32', erosion=0, dilation=0)[source]#

Apply a boolean mask to other measurements.

Alias in recipe: apply_mask.

Parameters

mask_measurement_name – name of the measurement to use as a mask
apply_to (Optional[Collection[str]]) – list of names of measurements to apply the mask to
preserve_dtype – whether to cast back to original dtype after masking
fallback_dtype – default dtype for masked measurements
erosion (int) – the erosion to apply to mask in pixels
dilation (int) – the dilation to apply to mask in pixels

Methods:

`compute`(data)	Perform computation on data that results in an xarray.Dataset having measurements reported by the measurements method.
`measurements`(input_measurements)	Returns the dictionary describing the output measurements from this transformation.

compute(data)[source]#: Perform computation on data that results in an xarray.Dataset having measurements reported by the measurements method.

measurements(input_measurements)[source]#: Returns the dictionary describing the output measurements from this transformation. Assumes the data provided to compute will have measurements given by the dictionary input_measurements.

To Float#

class datacube.virtual.transformations.ToFloat(apply_to=None, dtype='float32')[source]#

Convert measurements to floats and mask invalid data.

Alias in recipe: to_float.

Note

The to_float transform is deprecated. Please use expressions instead.

Using to_float:

transform: to_float
apply_to: [green]
dtype: float32
input: ...

Using equivalent expressions:

transform: expressions
output:
    green:
        formula: green
        dtype: float32

    # copy unaffected other bands
    red: red
    blue: blue
input: ...

Parameters

apply_to – list of names of measurements to apply conversion to
dtype – default dtype for conversion

Methods:

`compute`(data)	Perform computation on data that results in an xarray.Dataset having measurements reported by the measurements method.
`measurements`(input_measurements)	Returns the dictionary describing the output measurements from this transformation.

compute(data)[source]#: Perform computation on data that results in an xarray.Dataset having measurements reported by the measurements method.

measurements(input_measurements)[source]#: Returns the dictionary describing the output measurements from this transformation. Assumes the data provided to compute will have measurements given by the dictionary input_measurements.

Rename#

class datacube.virtual.transformations.Rename(measurement_names)[source]#

Rename measurements.

Alias in recipe: rename.

Note

The rename transform is deprecated. Please use expressions instead.

Using rename:

transform: rename
measurement_names:
    green: verde
input: ...

Using equivalent expressions:

transform: expressions
output:
    verde: green

    # copy other unaffected bands
    red: red
    blue: blue
input: ...

Parameters: measurement_names – mapping from INPUT NAME to OUTPUT NAME

Methods:

`compute`(data)	Perform computation on data that results in an xarray.Dataset having measurements reported by the measurements method.
`measurements`(input_measurements)	Returns the dictionary describing the output measurements from this transformation.

compute(data)[source]#: Perform computation on data that results in an xarray.Dataset having measurements reported by the measurements method.

measurements(input_measurements)[source]#: Returns the dictionary describing the output measurements from this transformation. Assumes the data provided to compute will have measurements given by the dictionary input_measurements.

Select#

class datacube.virtual.transformations.Select(measurement_names)[source]#

Keep only specified measurements.

Alias in recipe: select.

Note

The select transform is deprecated. Please use expressions instead.

Using select:

transform: select
measurement_names: [green]
input: ...

Using equivalent expressions:

transform: expressions
output:
    green: green
input: ...

Parameters: measurement_names – list of measurements to keep

Methods:

`compute`(data)	Perform computation on data that results in an xarray.Dataset having measurements reported by the measurements method.
`measurements`(input_measurements)	Returns the dictionary describing the output measurements from this transformation.

compute(data)[source]#: Perform computation on data that results in an xarray.Dataset having measurements reported by the measurements method.

measurements(input_measurements)[source]#: Returns the dictionary describing the output measurements from this transformation. Assumes the data provided to compute will have measurements given by the dictionary input_measurements.

User-defined transformations#

Custom transformations must inherit from datacube.virtual.Transformation. If the user-defined transformation class is already installed in the Python environment the datacube instance is running from, the recipe may refer to it by its fully qualified name. Otherwise, for example for a transformation defined in a Notebook, the virtual product using the custom transformation is best constructed using the combinators directly.

For example, calculating the NDVI from a SR product (say, ls8_nbar_albers) would look like:

from datacube.virtual import construct, Transformation, Measurement

class NDVI(Transformation):
    def compute(self, data):
        result = ((data.nir - data.red) / (data.nir + data.red))
        return result.to_dataset(name='NDVI')

    def measurements(self, input_measurements):
        return {'NDVI': Measurement(name='NDVI', dtype='float32', nodata=float('nan'), units='1')}

ndvi = construct(transform=NDVI, input=dict(product='ls8_nbar_albers', measurements=['red', 'nir'])

ndvi_data = ndvi.load(dc, **search_terms)

for the required geo-spatial search_terms. Note that the measurement method describes the output from the compute method.

Note

The user-defined transformations should be dask-friendly, otherwise loading data using dask may be broken. Also, method names starting with _transform_ are reserved for internal use.