Skip to content

Tabular Domain

For non-spatial or weakly-spatial tabular data. All operations are powered by DuckDB, operating on Parquet, CSV, and Arrow tables.

Decided in ADR-0006

Schema Extension

The tabular: block on a layer declares tabular metadata.

FieldTypeRequiredDescription
key_columnstringNoPrimary key column name
indexlistNoColumns to index for fast lookups
column_typesobjectNoColumn type overrides (e.g., {"zip": "VARCHAR"})
yaml
layers:
  census-tracts:
    uri: data/tracts.parquet
    type: table
    tabular:
      key_column: geoid
      index: [state_fips, county_fips]
      column_types:
        geoid: VARCHAR

Operations

The tabular domain provides 6 operations in 3 categories:

Transform

OperationTypeDescriptionKey Params
tabular_filtertable → tableFilter rows using a SQL WHERE expressionwhere (string, required)
tabular_selecttable → tableSelect, rename, or compute columnscolumns, exclude, expressions
tabular_sorttable → tableSort rows by columns, optionally limit to top-Nby (array, required), limit

Combine

OperationTypeDescriptionKey Params
tabular_jointable + table → tableJoin two tables on matching columns (inner, left, right, outer, cross)on or left_on/right_on, how
tabular_uniontable[] → tableStack rows from multiple tables verticallydistinct (boolean)

Summarize

OperationTypeDescriptionKey Params
tabular_aggregatetable → tableGroup rows and compute aggregate statisticsgroup_by (array, required), aggregations (object, required), having, order_by, limit

Compute

All tabular operations are powered by DuckDB. For datasets under ~50 MB, queries run in-browser via DuckDB-WASM (instant, free). Larger datasets run server-side with DuckDB native in the local or cloud tier.

DuckDB provides:

  • SQL-based transforms on Parquet, CSV, and Arrow tables
  • Zero-copy reads from Parquet files
  • Window functions, CTEs, and full SQL expression support

Examples

Filter and Aggregate

yaml
layers:
  source/sales:
    uri: data/sales.parquet
    type: table
    tabular:
      key_column: order_id

  analysis/active-sales:
    type: table
    compute:
      op: tabular_filter
      inputs:
        table: { layer: source/sales }
      params:
        where: "status = 'active' AND amount > 100"

  results/sales-by-region:
    type: table
    compute:
      op: tabular_aggregate
      inputs:
        table: { layer: analysis/active-sales }
      params:
        group_by: [region]
        aggregations:
          total_sales: "SUM(amount)"
          avg_order: "AVG(amount)"
          order_count: "COUNT(*)"
        order_by: ["total_sales DESC"]
    style:
      table:
        columns:
          - { field: region, label: Region }
          - { field: total_sales, label: "Total Sales", format: "$,.2f" }
          - { field: order_count, label: Orders }

Join Two Tables

yaml
layers:
  analysis/enriched:
    type: table
    compute:
      op: tabular_join
      inputs:
        left: { layer: source/orders }
        right: { layer: source/customers }
      params:
        on: customer_id
        how: left

Licensed under CC-BY-4.0