Virtual Products#
Introduction#
Virtual products enable ODC users to define a recipe that can load data from multiple products and perform on-the-fly computation while the data is loaded. By declaring the entire recipe up front, Datacube can perform optimisations like not loading data that will never be used, and optimising memory usage.
An example virtual product would be a cloud-free surface reflectance (SR) product derived from a base surface reflectance product and a pixel quality (PQ) product that classifies cloud. Virtual products are especially useful when the datasets from the different products have the same spatio-temporal extent and the operations are to be applied pixel-by-pixel.
The source code for virtual products is in the datacube.virtual
module.
An example notebook using virtual products can be found in the DEA notebooks collection.
Using virtual products#
Virtual products provide an interface to query and load data. The methods are:
query(dc, **search_terms)
Retrieves datasets that match
search_terms
from the database index ofdc
.group(datasets, **group_settings)
Groups datasets from
query
by timestamp. Does not connect to the database.fetch(grouped, **load_settings)
Loads the data from the grouped datasets according to
load_settings
. Does not connect to the database. The on-the-fly transformations are applied at this stage. To load data lazily usingdask
, specifydask_chunks
in theload_settings
.
Note
Virtual products do provide a load(dc, **query)
method similar to dc.load
.
However, this method is only to facilitate code migration, and its use is not recommended. It implements
the pipeline:
For advanced use cases, the intermediate objects VirtualDatasetBag
and VirtualDatasetBox
may be
directly manipulated.
Design#
Virtual products are constructed by applying a set of combinators to either existing datacube products or other virtual products. That is, a virtual product can be viewed as a tree whose nodes are combinators and leaves are ordinary datacube products.
Continuing the example in the previous section, consider the recipe for a cloud-free surface reflectance
product from SR products for two sensors (ls7_nbar_albers
and ls8_nbar_albers
) and their corresponding
PQ products (ls7_pq_albers
and ls8_pq_albers
):
:caption: Sample Virtual Product Recipe
collate:
- transform: apply_mask
mask_measurement_name: pixelquality
input:
juxtapose:
- product: ls7_nbar_albers
measurements: [red, green, blue]
- transform: make_mask
input:
product: ls7_pq_albers
flags:
blue_saturated: false
cloud_acca: no_cloud
cloud_fmask: no_cloud
cloud_shadow_acca: no_cloud_shadow
cloud_shadow_fmask: no_cloud_shadow
contiguous: true
green_saturated: false
nir_saturated: false
red_saturated: false
swir1_saturated: false
swir2_saturated: false
mask_measurement_name: pixelquality
- transform: apply_mask
mask_measurement_name: pixelquality
input:
juxtapose:
- product: ls8_nbar_albers
measurements: [red, green, blue]
- transform: make_mask
input:
product: ls8_pq_albers
flags:
blue_saturated: false
cloud_acca: no_cloud
cloud_fmask: no_cloud
cloud_shadow_acca: no_cloud_shadow
cloud_shadow_fmask: no_cloud_shadow
contiguous: true
green_saturated: false
nir_saturated: false
red_saturated: false
swir1_saturated: false
swir2_saturated: false
mask_measurement_name: pixelquality
from datacube.virtual import construct_from_yaml
cloud_free_ls_nbar = construct_from_yaml(recipe)
The virtual product cloud_free_ls_nbar
can now be used to load cloud-free surface reflectance imagery. The dataflow for loading the
data reflects the tree structure of the recipe:
Components#
Product (Input)#
The recipe to construct a virtual product from an existing datacube product has the form:
{'product': <product-name>, **settings}
where settings
can include datacube.Datacube.load()
settings such as:
measurements
output_crs
,resolution
,align
resampling
group_by
,fuse_func
The product
nodes are at the leaves of the virtual product syntax tree.
Collate (Combining)#
This combinator concatenates observations from multiple sensors having the same set of measurements. The recipe
for a collate
node has the form:
{'collate': [<virtual-product-1>,
<virtual-product-2>,
...,
<virtual-product-N>]}
Observations from different sensors get interlaced:
Optionally, the source product of a pixel can be captured by introducing another measurement in the loaded data that consists of the index of the source product:
{'collate': [<virtual-product-1>,
<virtual-product-2>,
...,
<virtual-product-N>],
'index_measurement_name': <measurement-name>}
Juxtapose (Combining)#
This node merges disjoint sets of measurements from different products into one. The form of the recipe is:
{'juxtapose': [<virtual-product-1>,
<virtual-product-2>,
...,
<virtual-product-N>]}
Observations without corresponding entries in the other products will get dropped.
Reproject (Data modifying)#
Reproject performs an on-the-fly reprojection of raster data to a given CRS and resolution.
This is useful when combining different datasets into a common data grid, especially when calculating summary statistics.
The recipe looks like:
{'reproject': {'output_crs': <crs-string>,
'resolution': [<y-resolution>, <x-resolution>],
'align': [<y-alignment>, <x-alignment>]},
'input': <input-virtual-product>,
'resampling': <resampling-settings>}
Here align
and resampling
are optional (defaults to edge-aligned and nearest neighbor respectively).
This combinator enables performing transformations to raster data in their native grids before aligning different
rasters to a common grid.
Transform (Data modifying)#
This node applies an on-the-fly data transformation on the loaded data. The recipe
for a transform
has the form:
{'transform': <transformation-class>,
'input': <input-virtual-product>,
**settings}
where the settings
are keyword arguments to a class implementing
datacube.virtual.Transformation
:
class Transformation:
def __init__(self, **settings):
""" Initialize the transformation object with the given settings. """
def compute(self, data):
""" xarray.Dataset -> xarray.Dataset """
def measurements(self, input_measurements):
""" Dict[str, Measurement] -> Dict[str, Measurement] """
See Built in Transforms for the available built in Transforms.
Custom Transforms can also be written, see User-defined transformations.
Aggregate (Summary statistics)#
Aggregate performs (partial) temporal reduction, that is, statistics, on the loaded data. The form of the recipe is:
{'aggregate': <transformation-class>,
'group_by': <grouping-function>,
'input': <input-virtual-product>,
**settings}
As in transform
, the settings
are keyword arguments to initialise the Transformation class.
The grouping function takes the timestamp of the input dataset and returns another
timestamp to be assigned to the group it would belong to. Common grouping functions (year
, month
, week
,
day
) are built-in.
ODC provides one built in Statistic class, which is xarray_reduction
. It applies a reducing method
of the xarray.DataArray
object to each individual band. Custom aggregate transformations are defined
as in User-defined transformations.
Built in Transforms#
Expressions#
- class datacube.virtual.transformations.Expressions(output, masked=True)[source]#
Calculate measurements on-the-fly using arithmetic expressions.
Alias in recipe:
expressions
. For example,transform: expressions output: ndvi: formula: (nir - red) / (nir + red) input: product: example_surface_reflectance_product measurements: [nir, red]
Initialize transformation.
- Parameters:
output –
A dictionary mapping output measurement names to specifications. That specification can be one of:
a measurement name from the input product in which case it is copied over
a dictionary containing a
formula
, and optionally adtype
, a newnodata
value, and aunits
specification
masked – Defaults to
True
. If set toFalse
, the inputs and outputs are not masked for no data.
Make mask#
- class datacube.virtual.transformations.MakeMask(mask_measurement_name, flags)[source]#
Create a mask that would only keep pixels for which the measurement with mask_measurement_name of the product satisfies flags.
Alias in recipe:
make_mask
.- Parameters:
mask_measurement_name – the name of the measurement to create the mask from
flags – definition of the flags for the mask
Apply mask#
- class datacube.virtual.transformations.ApplyMask(mask_measurement_name, apply_to=None, preserve_dtype=True, fallback_dtype='float32', erosion=0, dilation=0)[source]#
Apply a boolean mask to other measurements.
Alias in recipe:
apply_mask
.- Parameters:
mask_measurement_name – name of the measurement to use as a mask
apply_to (
Optional
[Collection
[str
]]) – list of names of measurements to apply the mask topreserve_dtype – whether to cast back to original
dtype
after maskingfallback_dtype – default
dtype
for masked measurementserosion (
int
) – the erosion to apply to mask in pixelsdilation (
int
) – the dilation to apply to mask in pixels
To Float#
- class datacube.virtual.transformations.ToFloat(apply_to=None, dtype='float32')[source]#
Convert measurements to floats and mask invalid data.
Alias in recipe:
to_float
.Note
The
to_float
transform is deprecated. Please useexpressions
instead.Using
to_float
:transform: to_float apply_to: [green] dtype: float32 input: ...
Using equivalent
expressions
:transform: expressions output: green: formula: green dtype: float32 # copy unaffected other bands red: red blue: blue input: ...
- Parameters:
apply_to – list of names of measurements to apply conversion to
dtype – default
dtype
for conversion
Rename#
- class datacube.virtual.transformations.Rename(measurement_names)[source]#
Rename measurements.
Alias in recipe:
rename
.Note
The
rename
transform is deprecated. Please useexpressions
instead.Using
rename
:transform: rename measurement_names: green: verde input: ...
Using equivalent
expressions
:transform: expressions output: verde: green # copy other unaffected bands red: red blue: blue input: ...
- Parameters:
measurement_names – mapping from INPUT NAME to OUTPUT NAME
Select#
- class datacube.virtual.transformations.Select(measurement_names)[source]#
Keep only specified measurements.
Alias in recipe:
select
.Note
The
select
transform is deprecated. Please useexpressions
instead.Using
select
:transform: select measurement_names: [green] input: ...
Using equivalent
expressions
:transform: expressions output: green: green input: ...
- Parameters:
measurement_names – list of measurements to keep
User-defined transformations#
Custom transformations must inherit from datacube.virtual.Transformation
. If the user-defined transformation class
is already installed in the Python environment the datacube instance is running from, the recipe may refer to it by its
fully qualified name. Otherwise, for example for a transformation defined in a Notebook, the virtual product using the
custom transformation is best constructed using the combinators directly.
For example, calculating the NDVI from a SR product (say, ls8_nbar_albers
) would look like:
from datacube.virtual import construct, Transformation, Measurement
class NDVI(Transformation):
def compute(self, data):
result = ((data.nir - data.red) / (data.nir + data.red))
return result.to_dataset(name='NDVI')
def measurements(self, input_measurements):
return {'NDVI': Measurement(name='NDVI', dtype='float32', nodata=float('nan'), units='1')}
ndvi = construct(transform=NDVI, input=dict(product='ls8_nbar_albers', measurements=['red', 'nir'])
ndvi_data = ndvi.load(dc, **search_terms)
for the required geo-spatial search_terms
. Note that the measurement
method describes the output from
the compute
method.
Note
The user-defined transformations should be dask-friendly, otherwise loading data using dask may
be broken. Also, method names starting with _transform_
are reserved for internal use.