Materialization
Cube Nodes
When we attach a materialization config to a cube node (instructions here), we are requesting DJ to prepare for the materialization of the cube’s underlying data into an OLAP database (such as Druid). This enables low-latency metric queries across all defined dimensions in the cube.
However, many such databases are only configured to work with simple aggregations, so DJ will break down each metric expression into its constituent simple aggregation measures prior to materialization. These measures are ingested into the OLAP database as separate columns and they’re combined back together into the original metrics when users request metric data.
For a detailed explanation of how this decomposition works, including supported aggregation types and dialect translation, see Metric Decomposition.
A few examples include:
SELECT
AVG(price)
AS avg_repair_price
FROM repair_order_details
price_count
count
count(price)
price_sum
sum
sum(price)
SELECT
SUM(price)
AS total_repair_price
FROM repair_order_details
price_sum
sum
sum(price)
SELECT
CAST(
SUM(
IF(discount > 0.0,
1, 0)
) AS DOUBLE
) / COUNT(*)
AS discounted_orders_rate
FROM repair_order_details
discount_sum
sum
sum(
if(
discount > 0.0,
1, 0
)
)
count_star
count
count(*)
SELECT
SUM(price1) +
SUM(price2)
AS total_cost
FROM costs
price1_sum
sum
sum(price1)
price2_sum
sum
sum(price2)
SELECT
APPROX_COUNT_DISTINCT(
user_id
) AS unique_users
FROM events
user_id_hll
hll_sketch_agg
hll_sketch_agg(user_id)
Stored as binary HLL sketch
The combiner expression for APPROX_COUNT_DISTINCT uses HyperLogLog functions:
hll_sketch_estimate(hll_union(user_id_hll))
This is automatically translated to the appropriate dialect when querying:
- Druid:
APPROX_COUNT_DISTINCT_DS_HLL(DS_HLL(user_id_hll)) - Trino:
cardinality(merge(user_id_hll))
Bring Your Own Materialization
While DJ can manage materialization end-to-end, you may want to orchestrate materialization separately. DJ supports this by generating SQL that you can execute in your own pipelines.
For incremental refresh, DJ supports temporal partition filtering - automatically adding partition filters to the generated SQL. Instead of scanning all historical data, queries are scoped to specific time windows using ${dj_logical_timestamp} placeholders that your orchestration tool can substitute with actual timestamps.
Setup
To enable temporal partition filtering for a cube, add partition configuration to your cube YAML:
# sales_by_date.cube.yaml
description: Sales metrics by date
metrics:
- v3.total_revenue
- v3.order_count
dimensions:
- v3.date.date_id
columns:
- name: v3.date.date_id
partition:
type_: temporal
format: yyyyMMdd
granularity: day
Usage
Basic Usage (Exact Partition)
Request measures SQL with temporal filtering enabled:
GET /sql/measures/v3/?metrics=v3.total_revenue&dimensions=v3.date.date_id&include_temporal_filters=true
Generated SQL:
SELECT order_date AS date_id, SUM(revenue) AS revenue_sum
FROM orders
WHERE order_date = CAST(DATE_FORMAT(CAST(${dj_logical_timestamp} AS TIMESTAMP), 'yyyyMMdd') AS INT)
GROUP BY order_date
With Lookback Window
For scanning N days/hours back from the logical timestamp (useful for catching late-arriving data):
GET /sql/measures/v3/?metrics=v3.total_revenue&dimensions=v3.date.date_id&include_temporal_filters=true&lookback_window=7%20DAY
Generated SQL:
SELECT order_date AS date_id, SUM(revenue) AS revenue_sum
FROM orders
WHERE order_date BETWEEN CAST(DATE_FORMAT(CAST(${dj_logical_timestamp} AS TIMESTAMP) - INTERVAL '7' DAY, 'yyyyMMdd') AS INT)
AND CAST(DATE_FORMAT(CAST(${dj_logical_timestamp} AS TIMESTAMP), 'yyyyMMdd') AS INT)
GROUP BY order_date
The ${dj_logical_timestamp} Placeholder
The generated SQL contains ${dj_logical_timestamp} placeholders that must be replaced with actual timestamps before execution:
- For DJ-managed materialization: DJ automatically substitutes this during refresh
- For external orchestration: Your tool replaces this with the target partition timestamp
Example substitution:
-- Before substitution (what DJ returns)
WHERE order_date = CAST(DATE_FORMAT(CAST(${dj_logical_timestamp} AS TIMESTAMP), 'yyyyMMdd') AS INT)
-- After substitution (what your orchestration runs)
WHERE order_date = CAST(DATE_FORMAT(CAST('2024-01-15T00:00:00' AS TIMESTAMP), 'yyyyMMdd') AS INT)
-- Evaluates to
WHERE order_date = 20240115
When Temporal Filters Are Applied
Temporal filters are only applied when:
include_temporal_filters=trueis set- The requested metrics + dimensions resolve to a cube
- That cube has temporal partition columns configured
If no matching cube exists, the parameter is ignored and no filters are added.