Skip to content

Compute Block

The compute: block on a layer defines how data is produced or transformed. There are three mutually exclusive modes: op: for registered operations, engine: for inline SQL or Python, and steps: for multi-stage pipelines. A layer MUST NOT use more than one of these at the top level.

Decided in ADR-0005 D5 - "Pipeline is eliminated as a separate concept. Compute IS the transformation."


Schema

FieldTypeRequiredDescription
opstringExclusiveRegistered operation name (e.g., terrain_slope). See Operations.
enginestringExclusiveInline engine: sql or python.
stepslistExclusiveOrdered list of compute steps.
backendstringNoOverride for external compute (e.g., gee). Rarely needed - the platform routes to compute tiers automatically.
querystringWhen engine: sqlSQL expression or file reference.
modulestringWhen engine: pythonPython file path relative to layer folder.
functionstringWhen engine: pythonEntry point function name.
inputsobjectNoNamed input layers. See Input References.
paramsobjectNoParameter definitions. Values bind to form controls or query variables.

op, engine, and steps are mutually exclusive at the top level of a compute: block.


op: Mode - Registered Operation

Uses a named operation from the registry. The platform routes the operation to the appropriate compute tier automatically.

yaml
layers:
  terrain/slope:
    type: raster
    compute:
      op: terrain_slope
      inputs:
        dem: { layer: terrain/elevation }
      params:
        algorithm: horn
        smooth: true
        smooth_sigma: 1.5

The operation name MUST match an entry in the operation registry. The inputs: block maps the operation's declared input names to workspace layers. The params: block provides parameter values, either static values or form-bound definitions.

See Operations for the full operation model.


engine: Mode - Inline SQL or Python

Engines are a core platform concept. They live at folia/engines/ and are dispatched by the compute executor.

Decided in ADR-0007

SQL Engine

SQL is declarative, sandboxed, and runs via DuckDB-WASM (client-side for small data) or DuckDB native (server-side for large data).

yaml
layers:
  analysis/parcels-summary:
    type: table
    compute:
      engine: sql
      query: |
        SELECT *, area_ha / total_area * 100 AS pct
        FROM read_parquet('input.parquet')
      inputs:
        input: { layer: source/parcels }

Input layers are referenced as :input_name bind variables in SQL. Parameter values are referenced as :param_name.

yaml
layers:
  pricing/calculator:
    type: computed
    compute:
      engine: sql
      query: |
        SELECT :count * ondemand_hr * :hours AS monthly_cost
        FROM :prices
        WHERE instance_type = :instance_type
      inputs:
        prices: { layer: cloud_pricing/ec2 }
      params:
        instance_type: { type: select, source: cloud_pricing/ec2 }
        hours: { type: slider, min: 0, max: 730, default: 730 }
        count: { type: number, default: 1 }

Python Engine

Python is Turing-complete, file-referenced, and runs server-side in a container. Python code MUST be stored as a file, never inline in YAML.

yaml
layers:
  analysis/risk-zones:
    type: raster
    compute:
      engine: python
      module: ./classify.py
      function: compute
      inputs:
        elevation: { layer: terrain/elevation }
      params:
        threshold: { type: number, default: 35 }

The referenced Python file:

python
# layers/analysis/risk-zones/classify.py
def compute(elevation, threshold=35):
    """Classify terrain risk zones from elevation-derived slope."""
    slope = elevation.slope(algorithm="horn")
    return (slope > threshold).astype(int)

SQL is configuration: small, declarative, safe inline. Python is code: always a file, git-trackable, reviewable, container-sandboxed.

Decided in ADR-0005 D1


steps: Mode - Multi-Stage Pipeline

Chains operations and engines. Output of step N feeds step N+1. Named intermediates (as:) can be referenced by later steps.

yaml
layers:
  analysis/vegetation-index:
    type: raster
    compute:
      steps:
        - op: cloud_mask
          params: { sensor: landsat8 }
        - op: ndvi_composite
          params: { method: greenest }
        - op: focal_median
          params: { radius: 10 }
          as: ndvi_smoothed
        - op: raster_reclassify
          inputs: { raster: { ref: ndvi_smoothed } }
          params:
            breaks: [0, 0.2, 0.4, 0.6, 0.8, 1.0]
      inputs:
        imagery: { layer: source/landsat8 }

Step Schema

FieldTypeRequiredDescription
opstringExclusive with engineRegistered operation for this step.
enginestringExclusive with opsql or python for this step.
inputsobjectNoOverrides or additions to pipeline inputs for this step.
paramsobjectNoParameters for this step.
asstringNoName for this step's output. Later steps can reference it via { ref: name }.

op: and engine: MAY be mixed in the same chain. If any step uses engine: python, the entire chain runs server-side.


Input References

The inputs: block maps named inputs to data sources. Three reference types are supported:

ReferenceSyntaxDescription
Workspace layer{ layer: terrain/elevation }References a layer defined in the workspace.
Step output{ ref: step_name }References the output of a named step (in steps: mode).
Self{ self: true }References the layer's own uri data.

The self: true Pattern

When a layer has both uri (stored data) and a compute: block, the compute can reference its own data:

yaml
layers:
  terrain/slope:
    uri: catalog://terrain/slope@v2
    type: raster
    compute:
      engine: sql
      query: SELECT * FROM :self WHERE slope_angle > :threshold
      inputs:
        self: { self: true }
      params:
        threshold: { type: number, default: 35 }
  • { self: true } is only valid on layers that have a uri.
  • { self: true } MAY coexist with other layer inputs.
  • Without { self: true }, compute produces output purely from inputs and params.

Execution Mode Inference

The runtime infers the execution mode. There is no mode: flag.

ConditionModeBehavior
All inputs static, no form paramsBatchRun once, store result at uri.
Any param bound to a form controlReactiveRe-execute on form change.
Depends on a reactive layerReactiveReactivity propagates up the DAG.
Layer has refresh: schedule(...)ScheduledRe-execute on cron.

Decided in ADR-0002 and ADR-0005


Compute Routing

The platform routes operations to one of three compute tiers based on data size, operation type, and engine:

TierWhenTools Used
Browserengine: sql with data < 50 MBDuckDB-WASM, client-side rendering
Localengine: sql with data >= 50 MB, op: with local dataDuckDB native, GDAL, rasterio, Python
CloudLarge-scale batch, fan-out/reduce, continental-scale opsK8s workers running the same libraries

The user does not choose a tier. The platform picks based on data size and operation type.

Routing Rules

For op: mode, the platform selects a tier automatically:

SignalRouting
Data is local, operation is lightweightLocal tier.
Data is in R2/S3 (folia-managed)Cloud tier (compute near the data).
Data has gee:// URIExternal - Google Earth Engine.
Explicit backend: gee on compute blockExternal - Google Earth Engine.

For engine: mode, routing is based on engine type and data size:

ContextTier
engine: sql, data < 50 MBBrowser (DuckDB-WASM, client-side)
engine: sql, data >= 50 MBLocal (DuckDB native, server-side)
engine: python (any size)Local or Cloud (Python container)
Multi-step chain with any Python stepEntire chain runs Local or Cloud

External Compute

Google Earth Engine is an external compute platform - it runs on GEE's infrastructure, not folia's. This is the one case where the user makes an explicit choice via backend: gee. All other routing is automatic.

Decided in ADR-0005 D7 and ADR-0007


Parameter Binding

Parameters defined in compute.params bind to form controls in the UI and to query variables in SQL:

yaml
compute:
  engine: sql
  query: |
    SELECT * FROM :data
    WHERE elevation > :min_elev
    AND slope < :max_slope
  inputs:
    data: { layer: terrain/combined }
  params:
    min_elev: { type: slider, min: 0, max: 5000, default: 1000 }
    max_slope: { type: slider, min: 0, max: 90, default: 45 }
  • Parameter names map to :param bind variables in SQL by name.
  • Input layer names map to :input bind variables in SQL by name.
  • For Python engines, parameters are passed as keyword arguments to the function.

The form rendering (slider vs. dropdown vs. toggle) is a view concern. The params: block defines the data contract: what type, what range, what default. The UI reads this contract and renders appropriate controls.

Licensed under CC-BY-4.0